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FROM THE EDITOR 


This book begins a new effort by BYTE Publications to provide our readers with the best 
available manuscripts on the major topics of interest to the home computerist. Included in the 
new series of BY TE’s books are reprints of the best articles from past issues of BYTE magazine, 
plus new material which has not been printed anywhere before. The books will be organized in 
logical volumes of related topics. This provides the reader with vital information from previous 
BYTE issues which he or she might have missed, new material that has not appeared in the 
magazine, plus a book covering one specific theme for quick, easy reference. 

Manuscripts included in these books are of the same high quality as those found in BYTE 
magazine, because we use the same stringent criteria in selecting new manuscripts for inclusion 
in these books as we do in choosing them for the magazine. Generally, the additional criterion 
used to select manuscripts for the books instead of the magazine is a constraint on the length of 
articles used in the magazine itself. In addition, we receive so many quality manuscripts that we 
could never possibly include them all in BYTE magazine. Therefore, in our efforts to give the 
reader all the information needed to be a successful microcomputerist, we have decided to 
make these manuscripts available in book form. 

The book that you are holding in your hands is the first in a series on the general topic of 
Programming Techniques. This particular book deals with the details of the theory behind the 
design of the various aspects of programs. Anyone who has programmed for any length of time 
will agree that the most critical part of writing a program of any kind (application, system 
software, etc) is in the design phase, both the initial specifications and the program logic design. 
The actual coding of the program amounts to more of a mechanical process once the initial 
design of the program has taken place. Therefore, it is easy to see that unless the original design 
of the program is correct, the program cannot be expected to work as per specifications. 

The purpose of this book, then, is to provide the personal computer user with the techniques 
needed to design efficient, effective, maintainable programs. Included in the topics covered are 
structured program design, modular programming techniques, program logic design, and 
examples of some of the more common traps the casual, as well as the experienced, program- 
mer may fall into, In addition, details on various aspects of the actual program functions, such 
as hashed tables and binary tree processing, are included. 

Further books in this series will make available new techniques and further developments of 
the existing ones as they occur. This will allow you, the personal computer user, to stay up to 
date with the current technology of programming skills. 


Blaise W. Liffick 
Editor 


PROGRAM STRUCTURE 


About This Section 


For the last several years, those of us whose profession has been programming (applications, 
systems, scientific, whatever) have been bombarded on all sides with the latest philosophies of 
programming: structured, modular, top down, bottom up, GOTOless, etc. Not only do we get 
encouragement from employers to embrace whatever the most current popular technique for 
coding is, but we also get it from others in our profession who are adherents to one or the other 
philosophy. This is not to imply that any or all of the techniques do not have merit, but most 
of the coding philosophers are talking only about just that: coding. The main thrust of their 
basic arguments is against poor coding practices. And that’s just fine. But they forgot one 
important detail: once a program has been designed, all the coding techniques in the world are 
generally ineffective because the major portion of the program logic has already been set! The 
specifications and initial design of the program predetermines to a great extent how the coding 
can be performed. 

In the following section, the techniques for designing effective programs are presented. Both 
the amateur and professional programmer will profit from these practical techniques of design 
by being able to produce essentially error-free code. And for the amateur programmer there is 
an added bonus for following these practices: instant documentation! By carefully designing 
the function of the program before ever coding a single line, you insure that once the coding is 
completed, you can add something at any time. The code written so long ago will be easily 
understood, and you will know where and how to make any necessary changes. 

In addition, if everyone followed similar guidelines for designing programs, trading programs 
would be a painless and easy way to expand your program library. You could instantly under- 
stand what anyone else’s program was doing. And while someone’s 6800 code definitely will 
not run on your 8080 machine, the program has already been totally designed and can be easily 
coded into any other language! 


Structured Program Design 


In the world of electronics, no experi- 
menter in his right mind would build a cir- 
cuit by throwing a few parts together with 
some wire and some hope, then attaching a 
line cord and plugging it in to see if it works. 
Not only are you likely to destroy some 
very expensive parts, but it is also a good 
way to get fried, or at least get a new hairdo. 

Yet, after all the trouble that a serious 
microcomputer hobbyist will take to insure 
that his circuit is put together correctly be- 
fore he ever turns it on, he will invariably 
try to program his new computer by using 
a technique analogous to the one above. 
That is why his programs almost never 
run right the first time, if indeed they ever 
manage to run right at all. It is also why 
many microcomputer buffs stay up until 





David A Higgins 


odd hours of the night drinking coffee by 
the gallon in an effort to find that one 
little bug. 

But there is hope. I’m sure that nearly 
everyone involved with computers has heard 
something about structured programming in 
one form or another. It is not really a new 
technique, having been preached about for 
many years. However, the tools and meth- 
odologies available to design programs have 
changed radically over the years. 

In the beginning there were flowcharts, 
which looked like five-dimensional octopi 
or the corporate structure of a conglom- 
erate. Despite the absence of a consistent 
approach that would enable everyone to 
design a program using flowcharts, those 
programmers who did bother to work out 


Figure 1: The Warnier-Orr 
diagram showing the basic GET PLAYER'S NAME 
ASK IF PLAYER WANTS INSTRUCTIONS 
structure of the BUG pro- 
gram. PRINT GAME INSTRUCTIONS 
BEGIN 
PROGRAM 
NO 
(0,1) 
BEGIN GAME INITIALIZE BUG PARTS 


“BUG” 
PROGRAM 


GAMES TURNS 
(1,9) ay 


END GAME 


END 
PROGRAM SKIP 


PLAYER’S TURN see figure 2 


COMPUTER'S TURN see figure 2 


{ see figure 3 


GET NEW PLAYER’S NAME 


END TURN 


their problem with a flowchart first usually 
seemed to have more luck in getting pro- 
grams to run sooner and better than 
programmers who did not. 


Structuring Tools 


The development of mathematics would 
surely have been stymied if Roman numerals 
had been retained as our number system. In 
much the same manner, the science of 
structured program design would have been 





TURN 


Figure 2: Diagram of the logic for the PLA YER and COMPUTER TURNS 
routines of the BUG program. Note that item means “the complement of item.” 
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mired down if only flowcnarts had oeen 
available for developing programs. It is not 
that calculus is impossible with Roman 
numerals, it’s just that it’s extremely dif- 
ficult. Thus, over the years, a number of 
design and documentation tools were devel- 
oped to better enable a programmer to 
understand the problem before going out to 
do battle with the program. 

TOP-DOWN or GOTO-less programming, 
developed by Dijkstra and others, was pro- 
bably the first major attempt to solve the 
design versus coding problem. Dijkstra sim- 
ply observed that the more GOTOs that 
were in a program, the less likely it was to 
run correctly. Dijkstra called such programs 
“spaghetti bow!” programs, because if you 
drew a line from each GOTO in the program 
to its destination, you ended up with a mess 
that looked like a bowl of spaghetti. He 
showed how any program could be written 
with just a few simple flow structures 
without any GOTOs. His techniques pro- 
duced simple, readable code that was easy 
to test and maintain. So, the big push among 
design aficionados was to eliminate the 
GOTOs in their programming. Although 
TOP-DOWN programming was a big advance- 
ment over flowcharting, it was just that: 
programming. 1t was a technique for coding 
a program, not necessarily designing it. 

Another technique, IBM’s HIPO (and 
{ater HIPO-DB) entered the design field 
almost by chance, being primarily a docu- 
mentation tool that was also being used 
for program design. The major drawback to 
HIPO techniques, besides the fact that they 
did not work well for designing a program, 
was their tendency to produce 50 pages of 
documentation for a three page program. 


Warnier-Orr Diagrams — A New Approach 


Within the fast four years a new tech- 
nique for program design has evolved from 
the work of Jean-Dominque Warnier (pro- 
nounced warn’-yay) in France, and Kenneth 
T Orr of Langston, Kitch and Associates in 
Topeka KS. The technique has foundations 
in set theory and Boolean algebra, and holds 
much promise for program design appli- 
cations, Warnier-Orr diagrams, as we have 
called them here in the United States, allow 
programmers to design faster than ever 
before, to code programs with little or no 
effort, and produce programs that usually 
run correctly the first time. The approach 
is not limited to small programs. Nothing 
will make a believer out of someone quicker 
than a 20 page COBOL program which runs 
correctly the first time. The Warnier-Orr 
technique stresses design over coding and 


contends that once a problem is designed, it 
does not matter what programming language 
you code it in! At Langston, Kitch and Asso- 
ciates, people have used the technique 
to program in COBOL, PL/I, ALGOL, 
FORTRAN, BASIC, RPGII and assembler 
languages. It works equally well for all of 
them. 


Warnier-Orr Diagram 


The simplest way to learn about Warnier- 
Orr diagrams is to see examples of them. 
Warnier-Orr diagrams are very easy to learn 
and use; however, be forewarned that this is 
a technique that is sometimes deceptively 
simple, but not as trivial as it often seems. 

Let’s consider the relatively simple game 
of BUG. In this game the computer rolls a 
die, once for itself and once for its oppo- 
nent. Each number of the die corresponds to 
a part of the BUG’s anatomy: 1 = BODY, 
2 = NECK, 3 = HEAD, 4 = ANTENNAE, 
5 = TAIL, and 6 = LEGS. The object of the 
game is to finish your bug before the com- 
puter finishes its bug. Other rules: you must 
have a body before you can have legs, a neck 
or a tail; you must have a neck before you 
can have a head, and you must have a head 
before you can have antennae. One body, 
one neck, one head, one tail, six legs and 
two antennae are needed to complete a bug. 
Figure 1 is a Warnier-Orr diagram showing 
the basic structure of the BUG program. 

The Warnier-Orr diagram is read left to 
right, top to bottom, just like conventional 
English text. The brackets enclose logically 
related operations, the largest of which is the 
program itself. The BUG program is com- 
posed of three logical sections: 


@® The BEGIN PROGRAM = section, 
where the player’s name is requested 
and there is an explanation of the 
game rules. Note that the @ symbol 
between the modules YES and NO 
denotes the exclusive OR function, 
meaning that one or the other but not 
both of the modules will be per- 
formed. Observe also that this is re- 
flected in the number of times that 
each module may be performed: 0 if 
the condition is false and 1 if the 
condition is true. 

@ The process section, GAMES, where 
the playing of the game actually takes 
place. The (1,g) denotes that the sec- 
tion is to be performed at least once, 
and possibly many (g) times. 

@ The END PROGRAM section, which 
in this case js empty, but which 
usually contains things such as the 
closing of files, the goodbye message, 
etc. 


The rest of the brackets decompose in a 
similar fashion. The GAMES procedure 
breaks down into the beginning of the game, 
(BEGIN GAME), the turns that each player 
takes (TURNS), and the end of the game 
(END GAME). 

Notice that logically there are things that 
only happen at the beginning of the program 
and things that only happen during the play- 
ing of the game itself. The Warnier-Orr di- 





HAS EITHER PLAYER COMPLETED A BUG 


COMPUTER WINS DISPLAY 
(0,1) “1 WIN" 
YES (0,1) © 
END TURN OPPONENT WINS DISPLAY 
(0,1) “YOU WIN" 
© DECLARE END OF GAME 


NO (0,1) 


<< ON TO NEXT TURN 


Figure 3: Warnier-Orr diagram for the ending of a turn or a game. 





Listing 1: A structured BASIC program that was written using the Warnier- 
Orr diagrams of figures 1 thru 3. This code executed correctly the first time 
even though it was the author's first attempt at writing a BASIC program. 
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20 
30 


REM BUG PROGRAM 

REM BEGIN PROGRAM 

DIM HEAD(2), BODY (2), LEGS(2), TAIL(2), ANTE(2), NECK(2), CNT(2) 
GOSUB 120 

REM GAMES (1,G) 

LET EPGM=0 

GOSUB 200 

IF EPGM=0 THEN GOTO 70 

REM END PROGRAM 

STOP 

REM BEGIN PROGRAM SUBROUTINE 
PRINT ‘ENTER YOUR FIRST NAME’ 
INPUT :NAME$ 

PRINT ‘DO YOU WANT AN EXPLANATION OF THE RULES; ENTER YES 
OR NO.’ 

INPUT ANSS 

LET TEST = SCOMP ('YES’ ,ANS$) 

IF TEST = 0 THEN GOSUB 1200 ELSE ; 
RETURN 

REM GAMES SUBROUTINE ° 
REM BEGIN GAME 

GOSUB 290 

REM TURNS (1,7) 

LET EGAM =0 

GOSUB 390 

IF EGAM = 0 THEN 230 

REM END GAME 

GOSUB 1150 

RETURN 

REM BEGIN GAME SUBROUTINE 

LET BODY(1), BODY(2) =0 

LET CNT(1), CNT(2) =0 

LET NECK(1}. NECK(2) =0 


Listing 1, continued: 


310 
320 
330 


350 
360 
370 
380 
390 
400 
410 
420 
430 
440 
450 
460 
470 
480 
490 
500 
510 
520 
530 
540 
550 
560 


570 
580 


590 
600 


610 
620 


630 
640 


650 
670 
700 
710 
720 
730 
740 
750 
760 
770 
780 
790 
800 
810 
820 
830 


850 
860 
870 
880 
890 


900 


910 
920 
930 


950 
960 
970 
980 
990 
1000 
1010 
1020 


1030 
1040 
1050 
1060 
1070 
1080 
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LET HEAD(1), HEAD(2) =0 

LET ANTE(1), ANTE(2) =O 

LET TAIL(1), TAIL(2) =0 

LET LEGS(1), LEGS(2) =0 

RETURN 

REM TURNS SUBROUTINE 

REM PLAYERS TURN 

REM LET PLAYER START TURN 

PRINT ‘HIT RETURN TO ROLL DIE’ 

INPUT A 

LET PLAY =1 

GOSUB 520 

REM COMPUTERS TURN 

LET PLAY =2 

GOSUB 520 

REM END TURN 

GOSUB 1060 

RETURN 

REM TURN SUBROUTINE 

REM PLAY=1;PLAYERS TURN—PLAY=2;COMPUTERS TURN 
REM ROLL DIE 

LET ROLL = FIX@ (((RND (0)) *6.0)) +1 

PRINT : “ROLLISA ", ROLL 

IF ROLL = 1 THEN IF BODY (PLAY) #1 THEN GOSUB 690 ELSE ; ELSE ; 
{F ROLL=1 THEN 650 

IF ROLL = 2 THEN IF BODY (PLAY) = 1 THEN IF NECK (PLAY) #1 
THEN GOSUB 760 

IF ROLL=2 THEN 650 

IF ROLL = 3 THEN IF BODY (PLAY) = 1 THEN IF NECK (PLAY) = 1 
THEN IF HEAD (PLAY) #1 THEN GOSUB 820 

{F ROLL=3 THEN 650 

IF ROLL = 4 THEN IF HEAD (PLAY) = 1 THEN IF ANTE (PLAY) #2 
THEN GOSUB 880 

(F ROLL=4 THEN 650 

(F ROLL = 5 THEN IF BODY (PLAY) = 1 THEN IF TAIL (PLAY) #1 
THEN GOSUB 940 

IF ROLL=5 THEN 650 

{F ROLL = 6 THEN IF BODY (PLAY) = 1 THEN IF LEGS(PLAY) 
#6 THEN GOSUB 1000 

RETURN 

REM BODY SUBROUTINE 

IF PLAY =1 THEN PRINT : NAMES, ” 'S BUG HAS A HEAD” 
IF PLAY = 2 THEN PRINT: “COMPUTER'S BUG HAS A HEAD” 
LET CNT (PLAY) =1 

LET BODY (PLAY) = 1 

RETURN 

REM NECK SUBROUTINE 

IF PLAY = 1 THEN PRINT : NAMES, “' ‘S BUG HAS A NECK” 

IF PLAY = 2 THEN PRINT : “COMPUTER'S BUG HAS A NECK” 
LET CNT (PLAY) = CNT (PLAY) + 1 

LET NECK (PLAY) = 1 

RETURN 

REM HEAD SUBROUTINE 

IF PLAY =1 THEN PRINT : NAMES, “ ‘S BUG HAS A BODY” 

IF PLAY = 2 THEN PRINT : “COMPUTER'S BUG HAS A BODY” 
LET CNT (PLAY) = CNT (PLAY) +1 

LET HEAD (PLAY) = 1 

RETURN 

REM ANTENNAE SUBROUTINE 

LET ANTE(PLAY) = ANTE(PLAY) + 1 

1F PLAY = 1 THEN PRINT : NAMES, “' ‘S BUG HAS ", 

ANTE (1), “ ANTENNAE.” 

IF PLAY =2 THEN PRINT : “COMPUTER'S BUG HAS”, ANTE (2) 
“ ANTENNAE.” 

LET CNT (PLAY) = CNT (PLAY) +1 

RETURN 

REM TAIL SUBROUTINE 

IF PLAY = 1 THEN PRINT : NAMES, “ ‘S BUG HAS A TAIL” 

IF PLAY = 2 THEN PRINT : “COMPUTER’S BUG HAS A TAIL” 
LET CNT (PLAY) = CNT (PLAY) + 1 

LET TAIL (PLAY) =1 

RETURN 

REM LEGS SUBROUTINE 

LET LEGS(PLAY) = LEGS(PLAY) +1 

IF PLAY = 1 THEN PRINT : NAMES, “ ‘S BUG HAS “, LEGS (1), “" LEGS.” 
IF PLAY =2 THEN PRINT : “COMPUTER'S BUG HAS ”, LEGS (2), 
“LEGS.” 

LET CNT (PLAY) = CNT (PLAY) +1 

RETURN 

REM END TURN SUBROUTINE 

IF CNT (1) = 12 THEN 1090 

IF CNT (2) = 12 THEN 1110 

GOTO 1130 


agrams allow you to see very easily just 
where and when a particular event must take 
place. After examining figure 1 carefully to 
make sure that you understand how the 
diagrams work, move on to the explanation 
of the PLAYER and COMPUTER TURNS 
section shown in figure 2. 

In figure 2, we have represented the logic 
for each of the players’ turns during the 
game. At the beginning of each turn, the die 
is rolled to determine the part of the BUG’s 
body that the player may receive. Whatever 
the roll, we then have a logical path to 
follow. Again, please note that the presence 
of the @ between each of the possible 
rolls denotes mutual exclusion, ie: only one 
of the paths may be selected. This partic- 
ular structure is known as a case statement. 

If the player rolls a 4, we first find the 
instructions to follow for a roll of 4 and 
check to see if the player has a BUG head. 
If he does, we then check to see whether or 
not the player already has two antennae. 
If he does, then we do nothing. If he does 
not have two antennae yet, we give him 
one antenna. If he does not have a BUG 
head, then again we do nothing. Ina similar 
fashion, all of the possible rolls and their 
associated procedures are explained. Now 
let’s move on to the Warnier-Orr diagram for 
the end of the turn, which is shown in figure 
3. 

If either player has won the game at the 
end of a turn, the computer declares the 
winner and ends the game. If neither player 
has won, the computer does nothing and 
cycles through for another turn. 


Structured Programming 


Having fully understood the problem, 
coding the BUG program is a simple and 
straightforward process. For this particular 
example | coded the program shown in list- 
ing 1 in a version of BASIC. 

As you can see, each bracket of the 
original Warnier-Orr diagram roughly corre- 
sponds to a subroutine in the finished code: 
the process GAMES, for instance, becomes 
the subroutine at line number 180 which is 
called repeatedly by the branch at line 80 
until EPGM equals 1, indicating that no 
more games are to be played; the process 
BEGIN PROGRAM is handled by the sub- 
routine at line 110, and so forth. 
The resultant code is: 


@ easy to read and understand 
@ easy to change and maintain 
®@ already documented 

®@ logically correct. 


It is also a program that will run correctly 
the first time, barring unforeseen syntax 


errors for those of us who can’t type or 
spell. All of this is possible because the 
program was thoroughly designed before 
it was even partially coded. 


Conclusion 


Warnier-Orr diagrams are a giant leap in 
the right direction for structured program- 
ming. They represent an attitude which, for 
the first time since people have been playing 
with computers, can lead to consistently 
reliable software that is very easy to main- 
tain. Currently, most data processing de- 
partments spend over 80% of their time 
and effort repairing old code that has 
suddenly gone bad. Warnier-Orr diagrams 
also provide the means to produce software 
of a quality that has never before been 
possible. 

If you think that you are interested in 
using Warnier-Orr diagrams to help you 
solve some of your software headaches, by 
all means try them. But as | mentioned 
above, this technique looks deceptively 
simple, and you may not have much success. 
Understanding a diagram such as the one 
presented in this text is one thing; creating 
one from scratch is another. 

If you do get bogged down, please feet 
free to write us for more information. If you 
try them, like them, and think you’ve done 
something exciting with them, again feel free 
to write us and tell us what you've done.@ 


Listi, 


1090 
1100 
1110 
1120 
1130 
1140 
1150 
1160 
1165 
1170 
1180 
1190 
1200 
1210 
1220 
1230 
1240 
1250 
1260 


1270 
1280 
1290 
1210 


ng 1, continued: 


PRINT : NAMES, “ ‘S BUG IS FINISHED’ YOU WIN” 

GOTO 1120 

PRINT : “COMPUTER'S BUG IS FINISHED, | WIN” 

LET EGAM = 1 

RETURN 

REM END GAME SUBROUTINE 

PRINT : “DOES ANYONE ELSE WANT TO PLAY” 

INPUT ANS$ 

LET TEST = SCOMP (ANSS, ‘YES’) 

IF TEST #0 THEN LET EPGM =1 

RETURN 

REM EXPLANATION OF RULES SUBROUTINE 

PRINT “THE GAME OF BUG IS PLAYED AS FOLLOWS:” 

PRINT “ A DIE IS ROLLED BY THE COMPUTER, AND EACH NUMBER” 
PRINT “ ON THE DIE CORRESPONDS TO A PART OF THE BUG'S ” 
PRINT ” BODY: 1=BODY, 2=NECK, 3=HEAD, 4=ANTENNAE, 5=TAIL” 
PRINT “ 6=LEGS. YOU NEED 1 BODY, 1 NECK, 1 HEAD, 2 ANTENNAE” 
PRINT “1 TAIL, AND 6 LEGS TO COMPLETE A BUG.” 

PRINT “ THE OBJECT OF THE GAME IS TO BUILD YOUR BUG 
BEFORE” 

PRINT “ COMPUTER BUILDS HIS.” 

PRINT “ —HIT RETURN WHEN YOU ARE READY TO PLAY.” 

INPUT A 

RETURN 
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Structured Programming =~ 


with Warnier-Orr 


Diagrams 


David A Higgins 


Part 1: Design Methodology 


Any successful program design method- 
ology must be able to do several things: it 
must produce consistent, low cost, high 
reliability results; it must produce them 
quickly, while still allowing for easy mainte- 
nance later and, it must be simple enough to 
allow anyone (and | do mean anyone) to use 
it. Warnier-Orr diagrams (after Jean- 
Dominique Warnier in France and Kenneth 
T Orr in the United States) satisfy all of the 
above requirements with an added bonus; 
they produce structured programs that 
nearly always run correctly at the first 
effective trial. They allow people to produce 
superprograms without being superprogram- 
mers. 

The purpose of this article is to show how 
to develop and code a structured program 
using the Warnier-Orr methodology from 
start to finish. The technique is a straight- 
forward approach to producing correct pro- 
grams. {t is just as valid and successful for 
personal microcomputer applications as it is 
for megacomputer applications in the world 
of business, science and industry. | feel that 
this method of designing a program is one of 
the most advanced state of the art software 
development techniques in existence today. 
tt is a concise, step by step method with 
predictable results. 


Step One: Identify the Output 


This is the first, the primary and the most 
important rule of all for the construction of 
a correct program. It cannot be emphasized 
enough. The failure to first identify the 
outputs of a program is usually the primary 
reason programs fail to run correctly. 

You must ask yourself the questions: 
“How will | be able to tell when | am 
through with this program?” “What will the 


printed, displayed and punched outputs 
physically look like?” ‘What will the pro- 
gram be able to do?” All of these questions 
must be thoroughly answered before you 
can even begin to think of coding the 
program. Skipping this step because ‘‘Aw, | 
know what | want to do,” or “‘Gee, this isn’t 
any fun, let’s start coding,” is a common 
mistake, and although you may get away 
with it on a small program once in a while, 
omitting it will kill you more often than not. 

A good example of the kind of trouble 
you can get into by assuming that you know 
everything about a problem can be found in 
a recent popular film. In the movie /eremiah 
Johnson, Jeremiah befriends an old hunter 
and trapper in the mountains. The old 
hunter asks Jeremiah if he can skin a bear. 
“Of course I can,” he replies. In the next 
scene, we see the old man running down a 
hill towards the cabin closely pursued by a 
very large bear. The hunter runs into the 
open front door, leaps out of the back 
window and yells: “There ... you skin that 
one and I’ll go get you another.” Jeremiah 
failed to do one basic thing; he forgot to ask 
whether the bear he was supposed to skin 
was dead. Skinning a dead bear is one thing, 
skinning one that is still running around the 
room trying to skin you is quite another, 
Just as writing a program after it has been 
properly defined is one thing, and trying to 
write one when you aren’t even sure what it 
is supposed to do when you are finished is 
another. 

Defining outputs is not really an un- 
reasonable requirement to make; after all, no 
building contractor would begin construc- 
tion without first knowing what the finished 
building was supposed to look like; no 
electrical engineer would start soldering 


parts together without a schematic diagram. 
In fact, no profession (reliable profession 
anyway) involved in the business of putting 
things together ever starts to build anything 
unless they know what it will look like after 
they are done. Yet, that is precisely the way 
most programmers try to write programs. 
Then they wonder what went wrong when 
they have problems. The same programming 
principles which apply to the professional 
apply just as much to the amateur, for no 
one’s time is unlimited. 

After defining all of the outputs of the 
program, the next step is to define the 
logical data base, although you will probably 
never really spend much time at this step 
with most personal microcomputer applica- 
uons. 


Step Two: Define the Logical Data Base 


The reason this step is trivial for many 
personal use applications is because the 
logical data base typically consists of only 
one numeric field. It is typically the field 
holding a person’s response to a program 
generated question. For illustrative purposes 
fet us look at a home computer application 
that requires a slightly more complex data 
base arrangement. Take for instance acom- 
puter program that would balance the family 
checkbook and produce a financial report 
each month. The report designed in step one 
might look something like figure 1. 

lf you were keeping manual records that 
vou wanted to be able to search very easily, 
vou would keep each one of those entries, 
perhaps on index cards, filed by year, by 
month and by date. Figure 2 illustrates a 
way of representing the logical data struc- 
ture for the checkbook balance report in 
Warnier-Orr notation. 

In figure 2, you can see the logical data 
structure for the checkbook balance report. 
The report is organized by year; within each 
vear by months; within each month by days; 
and within each day by transactions, which 
are either debits (checks) or credits (de- 
posits). Note that year, month, day, and 
transactions all appear in the report at least 
once and possibly many times; thus we see 
the notation (1,n) in the diagram. Having an 
entry for a day that had no transactions or 
having a monthly report with no days is 
hardly worth the trouble. However, each 
transaction is either a credit transaction 
(credit occurring once, and debit not occur- 
ring) or a debit transaction (debit occurring 
once and credit not occurring). This con- 
dition is reflected on the chart by the ‘‘e” 
symbol, which is the symbol for mutual 
exclusion. 

One important point needs to be made 


MONTHLY FINANCIAL REPORT 
FOR THE MONTH OF JANUARY 1977 


BALANCE FORWARD OF $231.90 


DATE CHECK# TO: DEBIT CREDIT BALANCE 
1 978 GROCERY STORE 2.23 229.67 
“MILK, BREAD, EGGS 
1 979 PHONE COMPANY 37.14 192.53 
3 980 GAS BILL 25.61 166.92 
5 981 GEORGE FREDRICK 5.00 156.92 
-SHOVELLING SNOW 
2 PAYCHECK DEPOSIT 312.18 469.10 
6 982 ELECTRIC COMPANY 23.15 445.95 
31 1013 BYTE MAGAZINE 12.00 237.11 
-SUBSCRIPTION RENEWAL 
CURRENT BALANCE 237.11 


Figure 1: Proposed output of a computer program for balancing a checkbook 
and producing an end of month report. 





here. The diagram of figure 2 is not the 
logical data base for this report; it is only the 
report’s logical data structure. Making a 
chart of the logical data base requires that 
we map the data elements that appear in the 
report onto the logical report structure, as 
we have done in figure 3. In figure 2 we 
showed conceptual relationships of one part 
of the structure to another. In figure 3 we’ve 
filled in the required details needed to 
complete each level of the structure. One 
level of the structure corresponds to one 
bracket and the levels are counted left to 
right. 


Step Three: Define the Physical Data Base 


Defining the physical data base of a pro- 
gram is largely a packaging decision: what 
physical arrangement of the data in the 
computer will best suit the needs of the 
program. The only help | can give you on 
this is the simple suggestion that the physical 





*[veat 
(0,1) 
YEAR MONTH DAY TRANS- 
SHERI EE div) (I,m) (1d) ACHONS ®@ 
CREDIT 
(0,1) 


Figure 2. Logical data structure for the checkbook balance report. The 
notation (1,n) indicates an operation will take place at least once and up ton 
times, inclusive. 
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YEAR NUMBER 


MONTH 
(1,m) 


YEAR 
(1,y) 


CHECK 
FILE 


Figure 3: The logical data 
base is generated by map- 
ping the data elements 
that appear in the report 
onto the logical data struc- 
ture. 
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NAME OF MONTH 
DAY NUMBER 


BALANCE FORWARD 


DAY 


1d) {1,t) 


CURRENT BALANCE 


TRANSACTIONS 


CHECK NUMBER 
“TO” DESCRIPTION 


DEBIT 
0,1) “FOR” DESCRIPTION 
(0,1) 
@ AMOUNT 
CREDIT CREDIT DESCRIPTION 
(0,1) 


AMOUNT 


BALANCE AMOUNT 





representation should mirror the logical 
representation in all but the most extreme 
cases. These are hardware decisions. You 
may wish to construct a file one way if you 
are using a cassette tape storage system; you 
may construct it another way if you have a 
floppy disk. You would not want to impose 
a file structure that forced a cassette tape to 
behave like a disk by running back and forth 
through the tape at high speed. That is a 
good way to burn up a tape drive in a hurry. 
Ultimately, as memories become faster, 
more versatile and more efficient, the phys- 
ical data base will probably always be able to 
mirror the logical data base. Magnetic bubble 
memories, for instance, have no moving 
parts to burn up. 

In the checkbook balance report program 
the simplest physical data base would be a 
sequential file. The necessary information 
and a brief description of each transaction 
could be stored in the order shown in figure 
4, read left to right. 

Given that we have a file with this 
information on it which is sorted by year, 
month, day and transaction, producing a 
report program is almost a trivial exercise. 


Step Four: Design the Process Structure 


Since in this case we are working with a 
single program, the process structure will 
ultimately represent the program structure. 
Were we designing an entire system, an 
accounts receivable system for instance, the 
process structure would represent many pro- 
grams and the associated system procedures 
that would operate them. The process struc- 
ture is obtained from the same logical data 
structure that the logical data base was 
derived from. 

Referring again to both figures 1 and 2, 
we can begin to design the program from the 
bottom to the top. Looking first at the left- 
most bracket of figure 2, which for this step 
is labeled REPORT PROGRAM, we could 
draw a structure thus: 


START PROGRAM {ores FILES 


{evose FILES 


REPORT PROGRAM 


—ND PROGRAM 





check or 
deposit 
field 1 


description 
field 2 


description 


transaction 


date flag ee amount 


Figure 4: A sequential file with a record format such as this is the simplest 
Physical data base for the checkbook program. The information that is 
needed has been decided by the logical data base. The order they are put on 
the file depends on exactly what you intend to do. Since in this case we will 
be sorting by date, the date of the transaction appears first on the file. 


Note that program structure is denoted by the matter of printing the CURRENT 


left to right positioning, and that sequences BALANCE at the end of the month: 
of operations are noted top (first) to bottom 
(last). 
We can see that the only thing for us to 
do at the beginning of the program is to MONTH 


ttm) 


open the files, and the only thing to do at 
the end of the program is to close the files 
we have used. Moving right to the YEAR 


END MONTH {rons CURRENT BALANCE 


bracket, the process END YEAR must be There are no processes to be performed at 
defined. For this program there is nothing to the end of each DAY, therefore we show the 
do at the end of the year, so we fill in the END DAY process the same way as the END 
bracket with the notation SKIP: YEAR process: 
YEAR 
(ty 
{ DAY 
END YEAR SKIP (Wd) 
END DAY {sr 
For the bracket labeled MONTH, there is 





Figure 5: Completed Warnier-Orr diagram for a checkbook balancing report program. This program arrangement will probably 
result in the smallest amount of memory being used. The sequences of operations at any given level (left-right position) are 
read from top to bottom. A level of operations corresponds to a logical level of procedure calls in a block structured program- 
ming language. 


OPEN FILES 
BEGIN PROGRAM SET INITIAL VALUES 
READ FIRST RECORD 


BEGIN YEAR (si 


PRINT HEADINGS 
BEGIN MONTH PRINT STARTING BALANCE 
INITIALIZE RUNNING BALANCE 


BEGIN DAY { sxve 
ig sKIP 
BEGIN 
TRANSACTIONS MOVE CHECK NUMBER, CHECK “TO”, ANDO 
CHECK AMOUNT TO PRINT LINE 
SUBTRACT CHECK AMOUNT FROM RUNNING BALANCE 
Sah MOVE RUNNING BALANCE TO PRINT LINE 
Al 
PRINT A LINE 
PRINT SECOND LINE (0,1) 
SPACE ONE LINE 
YEAR MONTH DAY TRANSACTIONS ® 
REPORT PROGRAM (ty) (im) (a) OW 
MOVE DEPOSIT AMOUNT, DEPOSIT DESCRIPTION 
TO PRINT LINE 
= ADD DEPOSIT AMOUNT TO RUNNING BALANCE 
CREO! 
(0,1) MOVE RUNNING BALANCE TO PRINT LINE 
PRINT A LINE 
SPACE ONE LINE 
END TRANSACTION ( GET NEXT RECORD C3 
END DAY (sxe 
END MONTH € PRINT CURRENT BALANCE 
END YEAR { SKIP 


END PROGRAM (crose FILES 
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The TRANSACTIONS process is where 
most of the work is done. For each CREDIT 
or DEBIT, one line and possibly a second 
(for DEBIT) is printed, showing the appro- 
priate information; the running balance is 
updated, and the next record must be read: 


MOVE CHECK NUMBER, 
CHECK “TO”, AND 
CHECK AMOUNT 

TO PRINT LINE 


SUBTRACT CHECK AMQUNT 
FROM RUNNING BALANCE 
DEBIT 


(0,1) MOVE RUNNING BALANCE TO 


PRINT LINE 
PRINT A LINE 
PRINT SECOND LINE (0,1) 


@ SPACE ONE LINE 


TRANSACTIONS 


au 
MOVE DEPOSIT AMOUNT, 


DEPOSIT DESCRIPTION 


TO PRINT LINE 


ADO DEPOSIT AMOUNT TO 
CREDIT RUNNING BALANCE 
(0,1) 


MOVE RUNNING BALANCE TO 


PRINT LINE 
PRINT A LINE 
SPACE ONE LINE 


END 
TRANS. 
ACTION GET NEXT RECORD 


With this much of the program design 
done, the only things to be filled in are the 
BEGIN brackets for each level. The entire 
diagram with these processes added is shown 
in figure 5, 

Looking at the Warnier-Orr diagram for 
the checkbook balance program, you can see 
the entire series of events which must take 
place to correctly process the report as it 
was given. Note also that this is the only 
correct structure that will produce the 


checkbook balance report. Any other struc- 
ture that will produce the report is iso- 


morphic to this structure. The structure is 
also optimal in operation, in the sense that 
nothing is ever done unless it must be done. 

The program which is coded from this 
structure will also have some predictable 
features. It will run as quickly as possible. It 
will usually require the least amount of 
storage. It is very easy to maintain, and it 
will run correctly at the first effective trial. 
Not bad dividends for a half hour of extra 
work. Syntax runs are not effective trials, 
but, with a little diligence and effort, syntax 
errors can also be brought under control. 

Part 2 will show how easy it is to fill in 
the details of structured programs using 
Warnier-Orr diagrams.@ 


Structured Programming 


with Warnier-Orr Diagrams 


David A Higgins 


Part 2: Coding the Program 


In part 1 we carefully constructed a 
design structure. In order to make the most 
of that structure a few words about pro- 
gramming style are in order. While it is true 
to a certain extent that any method of 
coding the structure will produce a logically 
correct program, matters of syntactical 


errors resulting from shoddy coding tech- 
niques as well as problems with maintenance 
seem to indicate that a great deal of care 
should be exercised in the construction of 
the actual program code. 

For this particular example, I'll use a 
fairly standard version of BASIC that 





OPEN FILES 
BEGIN PROGRAM SET INITIAL VALUES 
READ FIRST RECORD 


BEGIN YEAR (stu 


PRINT HEADINGS 
BEGIN MONTH PRINT STARTING BALANCE 


INITIALIZE RUNNING BALANCE 


TRANSACTIONS MOVE CHECK NUMBER, CHECK “TO”, AND 
CHECK AMOUNT TO PRINT LINE 


SUBTRACT CHECK AMOUNT FROM RUNNING BALANCE 


DEBIT MOVE RUNNING BALANCE TO PRINT LINE 
PRINT A LINE 
PRINT SECOND LINE (0,1) 
SPACE ONE LINE 


MOVE DEPOSIT AMOUNT, DEPOSIT DESCRIPTION 
TO PRINT LINE 


ADD DEPOSIT AMOUNT TO RUNNING BALANCE 
MOVE RUNNING BALANCE TO PRINT LINE 
PRINT A LINE 

SPACE ONE LINE 


BEGIN DAY (sxe 
( SKIP. 
BEGIN 
(0,1) 
YEAR MONTH DAY TRANSACTIONS ® 
REPORT PROGRAM (ry) (1m) au 
CREOIT 
(0,1) 
END TRANSACTION ( 
END DAY (sxe 
END MONTH (print CURRENT BALANCE 


END YEAR (ski 


END PROGRAM ( crose FILES 


Figure 1: Final Warnier-Orr diagram description of the checkbook balance report program (reproduced from part 1). 
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GET NEXT RECORD 
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Figure 2: This a a flowchart chosen at random for comparison to a Warnier- 


MATCH 


SUBTRACT 
SUBJECT V 
FROM 

OBJECT V 








NEGATIVE 











COMPLEMENT 
DIFFERENCE 







IDENTIFICATION 
CODE 





RETURN 
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GET VERTICAL 
EXTENT OF 
















OIFFERENCE 
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USE HMATCH 
TO DETERMINE 
SIGHTING 

FLAG STATUS 







USE VSIGHT 
TO DETERMINE 
SIGHTING 
FLAG STATUS 





RETURN 
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runs on a J100 Jacquard Systems com- 
puter. The concepts and _ construction 
rules are just as applicable to Tiny BASIC, 
assembly language, and especially APL. 
Obeying the following five coding con- 
ventions will help you write a program that 
will execute on the first time. 


Coding Convention 1: Names Should Be 
Indicative of Function 


For versions of BASIC that only allow 
one letter names, this is often a little hard, 
but for most other languages with multiple 
character symbols, it is a must. For instance, 
a field that contains an amount should be 
labeled AMOUNT, an address field should 
probably be called ADRESS, and so 
forth. Cutesy names: SNEEZY, DOPEY, 
GRUMPY, HELL (a perennial favorite label 
for adolescent COBOL programmers) are 
to be strictly avoided. 


Coding Convention 2: Comments Should 
Be Used Freely 


Comment lines in programs written in 
obscure languages, APL for instance, should 
probably outnumber actual lines of code. 
Comment lines are especially useful for 
explaining unclear methods of calculation, 
complex decisions, etc. 


Coding Convention 3: Every Bracket of a 
Warnier-Orr Diagram Should Represent a 
New Subroutine 


Languages that do not permit subrou- 
tines or languages that limit the levels of 
nesting of subroutines are very tricky to 
use and should be avoided if at all possible. 
Save your spare change for three or four 
weeks and go buy a better version of BASIC; 
there are plenty of good ones on the mar- 
ket. In BASIC, each subroutine should be 
clearly labeled with REMark statements. 


Coding Convention 4: Subroutines Should 
Be as Short as Possible 


If a subroutine contains too many state- 
ments it is difficult to understand and main- 
tain. It also means you are probably doing 
something in this subroutine that should 
be put in another subsequent subroutine. 
In most high fevel languages a practical 
limit of 10 to 20 statements is appropriate. 
This rule is standard structured program: 
ming practice. 


Coding Convention 5: GO TOs Should Be 
Avoided 


In higher level languages, GO TOs can 
often and should be eliminated entirely. 
However, in versions of BASIC that do not 


have a DO verb and in assembler, GO TOs 
are often necessary. Utmost care is urged 
whenever a GO TO is used; it should only 
be used as a last resort. In assembly lan- 
guage, use of arbitrary jumps or branches 
should be avoided. 

When coding the program, the order of 
the subroutines is not crucial. The only 
piece of code that must be fixed in any 
certain location is the highest level bracket 
which must be the first executable line, 
or lines, of code. One possible way of 
coding the first section is to omit the first 
bracket and consider the code as the main 
program. For BASIC, subroutine calls are 
left unnumbered until the subroutine is 
actually written. In this case, we use nnn to 
indicate an unknown number. 


100 REM CHECKBOOK BALANCE REPORT PROGRAM 


110 REM BEGIN PROGRAM 

120 GOSUB nnn 

130 REM YEAR (1,Y) 

140 LET ENDYR = FALSE 

150 GOSUB nnn 

160 IF ENDYR = FALSE THEN GOTO 150 
170 REM END PROGRAM 

180 GOSUB nnn 

190 END 


Another way to program this section would 
be to have the above piece of code as a sub- 
routine to an even higher level procedure as 
follows. 


80 REM CHECKBOOK BALANCE REPORT PROGRAM 
90 GOSUB 110 
95 END 


100 through 180 as above 


200 RETURN 


Either way of coding is acceptable. Note 
that the GO TO in statement 160 is used to 
create the structure of a DO UNTIL, a 
feature that is not available with this par- 
ticular BASIC. 

The center path of the Warnier-Orr dia- 
gram is the easiest to begin to code at this 
point. So the code for the YEAR, the 
MONTH, and the DAY routines is shown 
next; for the subroutine YEAR: 


250 REM YEARLY PROCEDURE 


260 REM BEGIN YEAR 

270 GOSUB nnn 

280 REM MONTHS (1,M) 

290 LET ENDMO = FALSE 

300 GOSUB nnn 

310 IF ENDMO = FALSE THEN GOTO 300 
320 REM END YEAR 

330 GOSUB nnn 

340 RETURN 


For the subroutine MONTH: 


350 REM MONTHLY PROCEDURE 


360 REM BEGIN MONTH 


370 GOSUB nnn 

380 REM DAYS (1,D) 

390 LET ENDAY = FALSE 

400 GOSUB nnn 

410 IF ENDAY = FALSE THEN GOTO 400 
420 REM END MONTH 

430 GOSUB nnn 

440 RETURN 


For the subroutine DAY: 


450 REM DAILY PROCEDURE 

460 REM BEGIN DAY 

470 GOSUB nnn 

480 REM TRANSACTIONS (1,T) 

490 LET ENDTRN = FALSE 

500 GOSUB nnn 

510 IF ENDTRN = FALSE THEN GOTO 500 
520 REM END DAY 

530 GOSUB nnn 

540 RETURN 


The TRANSACTIONS process breaks down 
as follows: 


550 REM TRANSACTIONS ROUTINE 


560 REM CREDIT (0,1) OR DEBIT (0,1) 
570 IF CDFLAG = CREDIT THEN GOSUB nnn ELSE GOSUB nnn 


580 REM END TRANSACTION 
590 GOSUB nnn 
600 RETURN 


Subroutine DEBIT is coded a bit dif- 
ferently from the way it was designed for 
one simple reason. BASIC will let you out- 
put from the same fields that were read 
in as input; many languages do not. There- 
fore, the only code remaining in the sub- 
routine is the subtraction of the amount 
from the running balance and the print 
commands. 


610 REM DEBIT PROCEDURE 


620 LET RUNBAL = RUNBAL - AMOUNT 

650 PRINT ON PRINTR: DAY, CHKNUM, DESC1, DRAMT, CRAMT, RUNBAL 
660 IF DESC2 # SPACES THEN PRINT ON PRINTER: DESC2 

670 PRINT ON PRINTR: SPACES 

680 RETURN 


The symbol! # is the not equal to operator. 
Note that this code makes no attempt to 
format the output line. Although the facility 
is available with this version of BASIC, it 
differs greatly from other line formatting 
BASICs around, and would serve only to 
confuse the immediate issue. 


The CREDIT process is very similar to 
the DEBIT process. 


690 REM CREDIT PROCEDURE 

700 LET RUNBAL = RUNBAL + AMOUNT 

730 PRINT ON PRINTR: DAY, DESC1, CRAMT, DRAMT, RUNBAL 
740 PRINT ON PRINTR: SPACES 

750 RETURN 


The only remaining subroutines to be 
coded appear below: 


760 REM END TRANSACTION 

763 LET OLDDAY = DAY 

770 INPUT FROM CHECKI: DAY, CDPLAG, DESC], DESC2, 
CHKNUM, & AMOUNT 

775 ON ENDFILE GOSUB *14% 
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The original flowchart in figure 2 converted into Warnier-Orr diagram. This is a much 


into sections it can be programmed as a series of subroutines that can be easily maintained and 


simpler looking diagram and is easier to follow and explain to someone. Since it is broken down 
modifed. Note that item means “the complement of item.” 


Figure 3 
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Listing 1: BASIC source listing for the checkbook balance report program. 
Each of the subroutines can be matched with one of the brackets in the 
diagram of figure 1. The individual modules that do not contain any code 
should be left as they are to facilitate easy maintenance in the future. 


229 REM CHECKBOOK BALANCE REPORT PROGRAM 965 LET OLDDAY = DAY 

229 970 INPUT FROM CHECKS:DAY,CHKNUM,CDFLAG,DESC1,DESC2,AMOUNT 
222 REM BEGIN PROGRAM 980 ON ENDFILE CHECKS GOSUB 1030 

2390 GOSUB 1090 985 IF OLDDAY # DAY THEN LET ENDTR = TRUE 
249 990 RETURN 

230 REM YEAR (1,Y) 1000 

140 LET ENDYR = FALSE 1010 REM *8&&keakeardkaakakkknkkkkeeenbathhtkke eee 
270 GOSUB 280 1020 REM END OF FILE 

250 IF ENDYR = FALSE THEN GOTO 170 1025 

230 1030 LET ENDAY, ENDMO, ENDTR, ENDYR = TRUE 
220 REM END PROGRAM 1040 RETURN 

210 GOSUB 1290 1050 

220 END 1060 REM “ke ekbakakkkkekkkkkkekakakakkikhkhhheane 
230 1070 REM BEGIN PROGRAM PROCEDURE 

240 REM SRR RR RRR 1080 

250 REM YEARLY PROCEDURE 1090 OPEN 'CHECKS' ,SYMBOLIC, INPUT: CHECKS 
260 1100 STRING SPACES, CDFLAG, DESCl, DESC2, MONTH 
270 REM BEGIN YEAR 1110 DECIMAL AMOUNT, BALANC, RUNBAL 

230 GOSUB 1470 1120 LET TRUE = 1 

290 1130 LET FALSE = 1 

300 REM MONTH (1,M) 1140 LET SPACES = ° 

310 LET ENDMO = FALSE 1150 INPUT FROM CHECKS: DAY, CHKNUM, CDFLAG, DESC1, DESC2, 
320 GOSUB 430 AMOUNT, BALANC, & MONTH, YEAR 

330 IF ENDMO = FALSE THEN GOTO 320 1160 RETURN 

340 1170 

350 REM END YEAR 1180 REM ***eheekankakkkabekekakakkkkkkhekke hk 
360 GOSUB 1390 1190 REM BEGIN MONTH 

370 RETURN 1200 

380 1210 PRINT :'‘ CHECKBALANCE REPORT’ 
390 REM **s ee kd kh thd kaha keane ekanhaknnthhhe ee 1220 PRINT :! FOR THE MONTH OF ":MONTH; YEAR 
400 REM MONTHLY PROCEDURE 1230 PRINT :SPACES,'BALANCE FORWARD OF ';BALANC 
410 ° 1235 LET RUNBAL = BALANC 

420 REM BEGIN MONTH 1240 PRINT :'DAY CHECK# FOR DEBIT CREDIT BALANCE’ 
436 GOSUB 1210 1250 RETURN 

440 1260 

4350 REM DAYS (1,D) 1265 REM **#ke kek kekeae ek ek keke kek kkk eee kake 
460 LET ENDAY = FALSE 1270 REM END PROGRAM 

470 GOSUB 580 1280 

480 IF ENDAY = FALSE THEN GOTO 470 1290 CLOSE CHECKS 

490 1300 RETURN 

500 REM END MONTH 1310 

510 GOSUB 1340 1315 REM ***tenkdakeachkake keke keke 
520 RETURN 1320 REM END MONTH 

530 1330 

540 REM ®RO RRR ERA Ree Re EERE REE 1340 PRINT :'CURRENT BALANCE ‘,RUNBAL 

550 REM DAILY PROCEDURE 1350 RETURN 

560 1360 

570 REM BEGIN DAY 1365 REM ®At eke ee teehee eee eee ee 
580 GOSUB 1500 1370 REM END YEAR 

590 1380 

600 REM TRANSACTIONS (1,T) 1390 RETURN 

610 LET ENDTR = FALSE 1400 

620 GOSUB 720 1405 REM **k te ekeke kee eee kan kdakkad kkk keke 
630 IF ENDTR = FALSE THEN GOTO 620 1410 REM END DAY 

640 1420 

650 REM END DAY 1430 RETURN 

660 GOSUB 1430 1440 

665 RETURN 1445 REM **eketeedaktaeeadakenekkenkneeke keke 
670 1450 REM BEGIN YEAR 

680 REM TESES SERRE SLESLS SESE EERE EE REE RRR RSE SEE SD 1460 

690 REM TRANSACTIONS PROCEDURE 1470 RETURN 

700 1480 

710 REM CREDIT (0, 1) OR DEBIT (0, 1) 1485 REM R&R RRR REE Re 
720 IF CDFLAG = DEBIT THEN GOSUB 800 ELSE GOSUB 890 1490 REM BEGIN DAY 

730 1495 

740 REM END TRANSACTION 1500 RETURN 

750 GOSUB 965 

760 RETURN 

770 


775 REM ¢¢ oii iti donk it ik 
780 REM DEBIT PROCEDURE 


790 

800 LET RUNBAL = RUNBAL - AMOUNT 

810 PRINT :DAY;CHKNUM;DESC1;" *; AMOUNT ; RUNBAL 

820 IF DESC2 # ' * THEN PRINT :SPACES;DESC2 . 
830 PRINT :SPACES 

840 RETURN 

850 


860 REM (addr odoiioe ion iit 
870 REM CREDIT PROCEDURE 


980 

890 LET RUNBAL = RUNBAL + AMOUNT 

900 PRINT :DAY' *;DESC1; AMOUNT; * ";RUNBAL 
910 PRINT :SPACES 

920 RETURN 

930 


940 REM PAPE SESSSESERESEE RE RES ES RR EEE ERE RES RE ES 


950 REM END TRANSACTION 
960 


continued from page 21 
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778 
780 


790 
800 
810 


820 
830 
840 
850 
660 
870 
B80 


890 
900 
920 


IF OLDDAY # DAY THEN LET ENDAT = TRUE 
RETURN 


REM END OF CHECK FILE DEFAULT SUBROUTINE 


LET ENDAY, ENDTR, ENDMO, ENDYR = TRUE 
RETURN 


REM BEGIN MONTH PROCEDURE 


PRINT ON PRINTR: HDR1$ 
LET RUNBAL = BALANC 
PRINT ON PRINTR: RUNBAL 
PRINT ON PRINTR: HDR2$ 
PRINT ON PRINTR: SPACES 
RETURN 


REM END MONTH PROCEDURE 


PRINT ON PRINTR: RUNBAL 
RETURN 


The program is finished with the BEGIN 
PROGRAM and the END PROGRAM sub- 
routines, which are not developed here, 
and the replacing of the untagged GOSUBs 
coded before. The modules for which a 
GOSUB_ was generated should probably 
remain a part of the program even though 
they contain no code. They make main- 
tenance much easier. The entire working 
program with formatting and other embel- 
lishments appears in listing 1. 


Conclusion 


The art of programming has become 
a process which can be taught to anyone 
who needs to use it, which is something 
that we have not been able to accomplish 
until very recently. Admittedly, the tech- 
nique for developing programs presented 
here is sometimes tedious and not very 
Creative, but it will get the job done. In the 
personal computer field a lot of enthusiasts 
probably enjoy programming on the fly 
and spending all night debugging. But 
for those who don’t, including myself, 
and who aren’t satisfied with just running 
someone clse’s canned programs, there is 
an alternative. As the pioneer in this 
methodology, Jean-Dominique Warnier, puts 
it: “If you don’t have time to do it right, 
do you have time to do it over?” Real- 
istically, One cannot say that this method- 
ology is the ultimate in software process 


design or that it is completely right. It is 
not. Something is sure to come along in the 
future that is better. But, for now, it is 
certainly a large step in the right direction.e 


Once I finished reading about the ease 
with which Warnier-Orr diagrams could 
be used | decided to take a sample flowchart 
and convert it into the Warnier-Orr form to 
see how much of a difference there actually 
was. | happened to be working on an article 
by Geoffrey Guss (entitled ‘Starfleet’’) 
which contained a large number of flow- 
charts. Choosing one at random | converted 
it. Figure 2 is the original flowchart. Fig- 
ure 3 is the converted diagram, | think 
the Warnier-Orr form is much easier to read 
and understand, 

When designing with flowcharts it is 
sometimes difficult not to cross lines or 
have a great deal of redundancy in the pro- 
gram which makes it difficult to follow. 
All the arrows going ucross the paper are 
very distracting and hard to follow. The 
Warnier-Orr diagram does not have this 
disturbing problem. It is very easy to fol- 
low the program through the various 
subroutines. 

The Warnier-Orr diagram lends itself 
to structured program writing. If you con- 
sider each of the separate brackets another 
subroutine it is very easy to write the pro- 
gram just as it stands from top to bottom. 
When we use conventional flowchart tech- 
niques we end up leaping about the program 
to perform statements that are at various 
parts of the same routine. In my opinion 
the Warnier-Orr diagram is a quantum leap 
in the direction of aid for structured pro- 
gram designers. 


Ray Cote 
Editor 
BYTE Publications 





Warnier-Orr Diagrams: 


GT Wedemeyer 


The article “Structured Program Design” 
in the October 1977 BYTE, page 146 , has 
certainly simplified my thinking. However, 
the use of the symbol G) seems to violate 
a rule implicit in the Warnier-Orr diagram 
that one need not and in fact must not go 
up in a list contained within a bracket of a 
given order. The G@) symbol requires 
checking up and down the list of case 
statements. | believe that what is meant 
is illustrated in figure 1. In this example 
CASE J is equivalent to ROLL = “J.” 
This manner of diagramming clarifies the 
relationship between statements having 
alternatives and statements not having 
alternatives. It also eliminates the need for 
the instruction SKIP, since the finding of 
no more items in a list of a given order is 
the equivalent of an instruction to return 
to the proper place in the list of the next 
lower order, where the order of a list is 
its position from left to right as shown 
in figure 2. 

! would like to define the instruction 
RETURN to mean “‘in the list of next lower 
order than the list in which this instruction 
is found, complete the step immediately 
following the lowest completed step.” 
Although this instruction seems implicit, 
as | indicated above, | would prefer that 
it be explicitly stated, and | think it would 
make the diagrams more easily followed. 


Dave Higgins replies: 


It appears from your letter that you are 
very interested in using the Warnier-Orr 
diagramming techniques. | think you will be 
pleased with the results. 

I'd like to comment on the suggestions 
you made for improving the diagrams. 
Unlike flowcharts, which have become quite 
rigid and inflexible in form, the Warnier- 
Orr diagrams are still in a relative infancy, 
and do still change occasionally. We here at 
Langston, Kitch have made some minor 
modifications to the diagrams in the last 
year in order to add some capabilities that 
were previously vague or nonexistent. 
We are continually evaluating the diagrams, 
looking for shortcomings or ambiguities, 
and therefore welcome suggestions along 
these lines. It is in this light that | considered 
your suggestions for revising some of the 
notation. 


1 nage 3 of this edition. 


Some Further Thoughts 





Figure 1. 
BEGIN TURN 
(= ROLL DIE) 
DETERMINE “J” PICK RANDOM “J BETWEEN 
ONE AND SIX 
CHOOSE CASE "J" {case 1 
{case 2 
{case 3 
{ CASE 6 
(RETURN) 
Figure 2, 
ORDER 1 ORDER 2 ORDER3 ..... ORDER n 


First of all, with respect to your ideas 
concerning the representational form of a 
CASE statement: | think your objection 
to the use of the (+) symbol stems from the 
fact that there are two primary ways to 
actually code a CASE structure. One way is 
with the use of a “computed GOTO or 
GOSUB.” The diagram you show is ideally 
suited for translation into a computed 
GOTO, which would look something like 
listing 1. But I don’t think this is a worth- 
while change to make to the basic form 
of the diagrams themselves. The reason is 
this: although your method works fine for 
CASE statements that lend themselves to 
computed GOTO’s, there are a whole host 
of other CASE statements where the use of 
a computed GOTO is an extreme inconven- 
ience, Take, for example, the CASE of 
figure 3. 1t would be inconvenient to have to 
rig up a computed GOTO to execute this 
CASE. It is much simpler to code it using 
a “nested IF” statement, which is the other 
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26 


Figured: DAY = MONDAY 
(0,1) { TYAN 


© 


DAY = TUESDAY 
nnn ny { Wn 


© 


DAY = WEDNESDAY { nnw 
0,1 


Listing 1. 300 REM CASE STATEMENT 
310REM DETERMINE CASE “J” 
320 LET J=INT(RND(O)*6+1) 
330 ON J GOTO 340,380,420,460,500,540 
340 REM CASE 1 
350 . 
case 1 process 


370 GOTO 570 

380 REM CASE 2 
390 . 
case 2 process 


410 GOTO 570 


cases 3-6 as above 


570 REM END CASE 


Listing 2. 


300 REM CASE STATEMENT 

310 IF D$-"MONDAY” THEN 330 ELSE IF D$=""TUESDAY” THEN 360 
ELSE IF D$=“WEDNESDAY” THEN 400 

320 GOTO 440 

330 REM CASE 1: DAY = MONDAY 
monday process 


350 GOTO 440 
360 REM CASE 2: DAY = TUESDAY 


tuesday process 
390 REM CASE 3: DAY = WEDNESDAY 
wednesday process 


440 REM END CASE 


Listing 3. 300 REM CASE STATEMENT 
310 IF D$="MONDAY” THEN 350 
320 1F D$=""TUESDAY” THEN 400 
330 IF D$="WEDNESDAY” THEN 450 
340 GOTO 500 
350 REM CASE 1: DAY = MONDAY 


monday process 


390 GOTO 500 
400 REM CASE 2: DAY = TUESDAY 


tuesday process 


440 GOTO 500 
450 REM CASE 3: DAY = WEDNESDAY 


wednesday process 


500 REM END CASE 


popular way to code CASE statements. In 
pseudocode, this CASE is: 


IF DAY = MONDAY 

THEN MONDAY-ROUTINE 
ELSE IF DAY = TUESDAY 

THEN TUESDAY-ROUTINE 
ELSE IF DAY = WEDNESDAY 

THEN WEDNESDAY-ROUTINE 


You can see the natural one-to-one corre- 
spondence between the Warnier-Orr diagram 
and the pseudo-code. This is easily trans- 
lated to code in listing 2. Listing 3 shows an 
alternative for those BASICs without the 
nested IF capability. This is the preferred 
method for coding a case statement because 
this method will work for a/f CASE state- 
ments, regardless of whether or not the 
CASE is suited for a computed GOTO. 
Also, with the computed GOTO, you must 
be sure that your “J” is restricted to the 
proper range. This is not to say that you 
can never use the computed GOTO; just 
be sure that its use is justified and then 
be very careful, Personally, | feel it is more 
trouble than it is worth. 

As for the elimination of the brackets 
with “SKIP” in them: | don’t believe that 
you really want to do this. For instance, in 
the BUG game published in the October 
1977 BYTE’, no action is taken when a 
player rolls a “BODY” on the dice but 
already has a body. This bracket is filled 
with the notation “SKIP,” which indicates 
that, although the bracket is an essential 
part of the logic of the diagram, nothing 
is to be done there. However, in future 
versions of the game, you might just decide 
to tell the player that “YOU ALREADY 
HAVE A BODY” when that condition 
occurs. If the original diagram is left with 
the empty brackets intact, you have a 
fixed and ready place to put that PRINT 
command. The design is very easy to change 
and the documentation for the new program 
is only a matter of erasing one line and 
replacing it with another. 

Also, | don’t believe that we need to add 
the (RETURN) command at the end of the 
brackets as you suggest. As you State, the 
return to the next highest level in the 
diagram is already implied at the end of each 
bracket: therefore adding (RETURN) on 
each bracket would amount to a lot of 
“busywork,” which would clutter up 
the diagrams with a lot of unnecessary 
information. 

Again, I'd like to thank you for your 
suggestions and extend an invitation for all 
the readers of BYTE to submit their sug- 
gestions for improvement of the Warnier- 
Orr diagrams to either Langston, Kitch and 
Associates or to me for examination, @ 
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An Outline Method For 


Program Design 


Since | am dedicated to being human, |! 
always try to maximize the returns of an 
effort while minimizing the effort (a more 
technical way of saying “Getting the mostest 
for the leastest”). Therefore, when | read the 
article by David A Higgins on the Warnier- 
Orr diagram techniques', | thought “AHA!” 
(or something like that). This is it! The 
method | now use requires thought, logic 
and care to get good results. Perhaps the 
Warnier-Orr method is easier. 

| carefully, logically, and thoughtfully 
constructed a Warnier-Orr diagram of a 
program. It worked. I then carelessly, 
illogically, and unthinkingly constructed 
a Warnier-Orr diagram of a program. It not 
only bombed, it hung the computer up. 

The conclusion is, therefore, obvious. If 
! already have a method that works every 
time | use it and I’m familiar with it, why 
change? Well, so much for the obvious. If 
it’s better, change. 

| studied the Warnier-Orr diagram that 
Mr Higgins included in his article to deter- 
mine if it was better than my method or if 
it had something more to offer (I carefully 
laid aside most of my prejudices), when 
low and behold, the two methods are the 
same; only the form is different. Let me 
sneak in an advantage of my method. It 
can be stuck in the program as a remark. 

| did just that in my version of BUG. 
You can see that my version (listing 1) of 
the Warnier-Orr method is in the form of a 
simple block outline similar to the type 
forgotten from school. It simply outlines 
in logical sequence what you want done. 
Whenever a question needs to be answered, 
a substatement is generated until all the 
questions are answered. If nothing happens, 
simply continue on (just like life). 

Try either the Warnier-Orr method or 
this method. They both work and all you 
have to lose are ulcers, sleepless nights. . . 
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Jerry Goff 


Listing 1: FORTRAN version of BUG program using logical diagramming 
comments. The entire logic of the program is inserted at the start for docu- 
mentation purposes. This way you are never further from the documentation 
than a program listing. 


G01 
ABQ 
aeas 
aara 
aavs 
Ger6 
aaa, 
orns 
Anas 
ea1a 
Aart 
ee12 
0013 
ania 
aais 
0016 
@017 
aa16 
Fe19 
aa20 
e421 
A022 
n023 
ated 
aa2s 
an2e6 
anaz 
ar2e8 
0029 
AA3B 
east 
GAs2 
@233 
BA34 
@0335 
OA36 
@037 
2836 
939 
eAaa 
anal 
0A42 
0043 
ae4aa 
0045 
e046 
@oa7 
ee48 
@e40 
oesea 
0951 
0052 
@a53 
0854 
ae55 
arse 
@e57 
0ase 
@e3s9 
ween 
aA6l 
one2 
Oa63 
@nha 
areas 
@A66 
An67 
OAGA 


FING,L 


ANMAANAAAAANAANANAH AON AAANHMANANQAANAAAAAAAANAAAGAANAA HAA KG MAGA KGaANnANANNAnAaAN 


PROGRAM BUG 


SAAAAARERHATTERAR AERA HTH RAEHRAASHEEKRAHAAHRAERAKRERHREEEEEHRARKART 
® &) DIMENSION ARRAYS 
@ 6) INITIALIZE PARAMETERS 
# C) SET COUNTER FOR COMPUTERS TURN 
* ©) ROLL OIE 
Creat? 
YES GIVE &« BODY 
OIEs27? 
yes 
HAVE & BODY? 
YES GIVE & NECK 
CIEa3? 
YES 
HAVE A NECK? 
YES GIVE & HEAD 
NIEea? 
YES 
MAVE A HEAD? 
YES 
FEWER THAN 2 ANTENNAE? 
YES GIVE 1 ANTENNA 
DIEaS? 
YES 
MAVE & BODY? 
Yes GIVE A TAIL 
OIEa6? 
YES 
HAVE A BODY? 
Yes 
FEWER THAN 6 LEGS? 
YES GIVE { LEG 
ARE THERE 6 LEGS FOR THIS PLAYER? 
YES 
ARE THERE 2 ANTENNAE? 
YES 
18 THERE 1 TAIL? 
YES SAVE THIS PLAYER AS A WINNER 
HAVE BOTH PLAYERS HAD THEIR TURN? 
NO SET COUNTER FOR PLAYERS TURN @ CONTINUE AT O 
IS THERE A WINNER? 
NO CONTINUE aT C 
YES PRINT THE SCORES 
IS THE COMPUTER THE WINNER? 
YES PRINT THE COMPUTER WINS 
NO PRINT THE PLAYER WINS 
ARE THEY BOTH WINNERS? 
YES PRINT IT'S & DRAW 
€) PLAY AGAIN? 
YES CONTINUE aT 6 
no END PROGRAM - 


Beeseaeeeeeceaeeeeeeeeeeeveeeeese eee eeeseeeaeeeaoeesvnaeeeeeeeaeee 


HP 21Mx COMPUTER JERRY E, GOFF 


PUG FOWSee{eBODY 2aNECK JehEAD 48ANTENNA SeTAIL GSLEGS 


WINCLIECOMPUTER WIN(2) ePLAYER 


aeeeeeceeeeCeeeseeeeeecese se eeecee eeeececee ces eeeseeenvneea ea see eecaezuee 


e 
e 
e 
e 
BUG COLUMNSwwel aCOMPUTER 2=uPLaYER ° 
® 
e 
* 
e 


AEORHATEAATEAHAEREESEHAEEREERRORHHEREHEAEAAAHEEREENAHHARAERHAREE 


DIPENSION BUG(4,2),WIN(2),ITIME(S) »-TYEAR(4) 
OaTA L»M,INTEG, REAL /181,86,3925,325,0/ 
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Listing 1, continued: 


Abo 
POA70 
Meri 
Gu72 
oe7s 
GA74 
aa7s 
aa76 
Oar7 
0°78 
@a79 
ABB 
OAAL 
aeee2 
ares 
O64 
apes 
oee6 
ae067 
ease 
0989 
eo9a 
Gagi 
“ee2 
aag3 
aage 
aa9s5 
ange 
“ag? 
avee 
wags 
alae 
ares 
M102 
a1¢es 
0104 
@10s 
a1ne 
A107 
aie 
aiag 
@114 
wii 
wlte 
11g 
Pll4 
@115 
e116 
Ll? 
oy18 
@119 
0122 
@1eL 
e122 
a123 
e126 
e125 
A126 
e127 
0126 
e129 
alse 
131 
@132 
@133 
A134 
a13ss 
A136 
O137 
@138 
C139 
0140 
el14ay 
b142 
@1a3 
@14a 
a14s 
@146 
ota? 
148 
o1469 
A150 
@131 
0152 
e153 
e154 
0185 
a156 
0157 
e158 
e159 
e140 
e161 
0162 
0163 


WANNA AOAOMANHN naaanoan 


aoonn 


nonnnaexunnoanne 
Ss 


paw 
wsa 


aAnronan mAaAanae 


118 
120 
130 


enAnaan 


00 3 181.2 
00 2 J81,6 
BuG(J, Tea 
WIN(T) oh 
CONTINUE 
CONTINUE 


SPH HeEKEeeHeeHeeeaeeeeeetekeneteneh 
@ CALL THE TIME FROM THE COMPUTER e 
SHeKadeeeenecanettenakeennrenentenesD 


TCODEss! 

CALL EXECC(ICOOE,ITIME,IYEAR) 
XaFLOATCITIME(1)) 

XOX/170.0 


PR 
@ START THE GAME, COMPUTER 1ST e 
Peed eoenetseseeneadenenenetesnrerenenene 


PO 100 Kel,2 


eeeeerenrereraeree 
@ ROLL THE DIE «@ 
aeerererarennenes 


ITVYOINT(X@REAL) 
IRANDSMOD (M@IXoL, INTEG) 
XeCFLOATCIARAND) 00,5) /REAL 
N@eINT(1A,0ex) 

IF (N,GT.6,OR,N,LT.1) GOTO 7 


SRREUNTEEANEMERRA NHC ETATORREReRHRROReRAeeHee 
@ GO TO THE ACORESS CALLED 6Y THE OIE ® 
Ceveteuneannerrererreennekennnerenteneeenene 


GO TO (10,24.59,40,50,60),N 
BUGCI,K) ef) 
GOTO 74a 
IF (BUG(1,K),EG.1) BUG(2,K) ey 
GOTO 7a 
IF (BUG(2,K),E0.1) BUG(3,K) a1 
GOTO 7 
TF (BUG(35K) ED,S AND BUG(4,K),LT.2) BUG(A,K) SBUG(4,K) OL 
GOTO 70 
IF (BUG(1,4),€09,.1) BUG(S,K) 8} 
GOTO 7¢@ 
IF (BUG C1.%),E0,1, AND, BUG(6,K),LT.6) BUG(6,K)eBUG(6,K)o1 


CAPER C NKR e RECN eRe eReeheReeeeeheeeneehe 
© CHECK IF THERE IS & WINNER * 
a) 


IF (BUG (4,4) ,E0,2,4ND,BUG(5,K) .EQ,1,AND,GUG(6,K),EQ0,6) WIN(K) 84 
CONTINUE 


AOPKAREEHAEHAHEKeHERHNE OHO NH ORAeeH RED 
@ JUST SOME FORMAT STATEMENTS a 
abatetnennsaasecnssensnnseenanerennene 


FORMAT (7,"COMPUTER HaS @") 

FORMAT (" PLAYER HAS @*) 

FORMAT (12,2M,"B0DY,"%,32,2K, "MEAD, ",12,2X, "NECK,®, 
712,20, "ANTENNA, ",12,2N,"TAIL » "912, 2M, "LEGS") 


SSAA ASAAREEETEHESAEREHEEEHAAESEREREREEREENEEHETEEEENE 
* CHECK FOR A WINNER AFTER BOTH HAVE PLAYED e 
HERERO ACHAETHEREHEREEEHEEHERATAKERESERERARETHERNRER 


IF (wIN(1),EG.@,4ND.WIN(2),E0,8) GOTO 6 


POCKET HE TREHONTAHOHETER RET TERHHEREHOT ARENA REON EEE ENOR EE RERAEe 
IF THERE IS A WINNER, WRITE THE SCORES AND WHO WON « 
ORRERE SERRE MEERA NEARER RHEE EP ERATE REET ER ATHENA ROTREOEHEEAREe 


WRITE (10,75) 
WRITE(10,85) BUG(1.1),8UG(2,1),8UG(3,1),8UG(4,19,8UG(5,1) ,.BUG(6,1) 
WRITE (19,80) 
WRITE(10,85) BUG(1,2),AUG(2,2),8UG(3,2) BUG (4,2) ,BUG(S,2) ,BUG (6,2) 
IP CWIN(1).EQ,1) WRITE(10,110) 
FORMAT (" DUE TO INCREDIBLE SKILL, I WIN") 
IF (WIN(C2),80,1) WRITE(10,120) 
FORMAT (" WITH ALL YOUR LUCK, YOU MANAGED TO WIN") 
IF (WINCS) EQS AND WINC2),E0,1) WRITEC10,190) 
FORMAT (" BUT IT'S A DRAW ANYHOW", /) 
WRITE (10,140) 7 








AVON NRE KEE Re HeNATRNEOReeEENNEENENE 
* PROGRAM EXIT (THIS IS HANDY FOR ST 
PARETERERECRERHERRE HARA ReeRERTAAARARaeeHeReeRAHeseeneeeereeee 


FORMAT(™ WANT TO PLAY AGAIN? iaYES, 29NQ @*) 
REAO(10,%) ANS 
TF (ANS,EQ@,1) GOTO 1 
END a 


Common Mistakes 


Using Warnier-Orr Diagrams 


In my opinion, one of the best program 
and system design methods is the Warnier- 
Orr structured systems design approach, 
which | described previously (‘Structured 
Program Design,” page 146,’ October 1977 
BYTE; “Structured Programming with 
Warnier-Orr Diagrams,” page 104,27 Decem- 
ber 1977 and page 122,> January 1978 
BYTE). This article is being presented be- 
cause of the interest expressed in this sub- 
ject, and because a lot of people will be 
trying these techniques for the first time. 
Newcomers to this methodology often have 
many questions about their work, and want 
to know whether or not what they are doing 
is correct. The purpose of this article is to 
outline a few of the more common mistakes 
that beginners make when using this tech- 
nique. 


Philosophical Errors 


Many first time users of the Warnier-Orr 
diagrams tend to make mistakes which are so 
similar that they are worth examining. The 
biggest and most common mistakes tend to 
be a direct result of what we can call philo- 
sophical errors; not really a misuse of the 
techniques so much as a misunderstanding of 
the techniques. The most common error 
stems from the fact that many computer 
programmers tend to be obsessed with the 
desire to write some kind of code at the very 
beginning of the design process. This prob- 
lem usually manifests itself in any or all of 
the following three ways: 

@ Trying to code the program while 
designing it (called the design-a-little, 
code-a-little approach). 

® Relying too heavily on language 
restrictions and considerations while 
doing logical design. 

@ Skipping the design phase altogether 
because: 

a) the program is ‘too easy” or 
b) the programmer is ‘‘too smart.” 


Any of the above practices will destroy 
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Editorial Note... 


Since publishing David Higgins’ first two articles on Warnier-Orr diagram- 
ming techniques, we have received a number of letters from people expressing 
the message (paraphrased) “if | have this or that self-documenting structured 
programming language, why should | use Warnier-Orr techniques? After all, if 
a program in my language is logically equivalent to the Warnier-Orr structure, 
and it is directly executable, | see no need for an extra layer of documen- 
tation.” 

A very real answer to this objection is that it is correct. There is no point 
to using Warnier-Orr techniques if you properly use a language such as 
PASCAL which, having structured programming constructs built in, allows 
long descriptive names for variables and procedures, and as a result can sup- 
port self-documenting cade. 

But most currently used languages in personal computirig do not easily 
support self-documenting code and modern concepts of structured program- 
ming. The usefulness of the Warnier-Orr methodology is that it provides a 
disciplined way of imposing such structure on a language such as BASIC, 
FORTRAN or assembly language. In effect, the Warnier-Orr discipline is a 
programming language which is intended for hand translation into one of the 
existing unstructured languages. . . Carl Helmers, Editorial Director 

BYTE Publications 


most if not all of the effectiveness of the 
Warnier-Orr methodology for any other 
structured programming methodology for 
that matter. . . CH/. It will certainly cause 
you to waste a great deal of time. 

If you try to use the first technique, the 
design-a-little, code-a-little approach, you 
will probably be in for quite a bit of erasing 
or retyping when you have to change the 
design because you coded yourself into a 
corner that you can’t design your way out 
of. Your program will tend to be twice as 
long as it should have been and half as 
efficient. You will probably be in for a lot of 
debugging runs while trying to put back into 
the code everything that you left out when 
you changed the design. As you can see, this 
technique just naturally generates problems. 

The second technique described above is 
a common mistake that veteran program- 
mers almost always seem to make: relying 
too heavily on the program language they 
will be using while doing the program design. 
Consider the two examples of program 
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(a) Mt > 40.0 


, 


© 


AT > 400 
(0,1) 


1000 


Vis 
(H1-40.0) 
*(S1*1.5) 


<vi =0.0 


(b) HOURS WOR KED > 40 
0,1) 


COMPUTE OVERTIME PAY 


© 


HOURS WORKED > 40 


(0,1) 


(HOURS WORKED OVER 40) TIMES 


SET OVERTIME PAY = 
(SALARY AT TIME AND ONE-HALF) 


{ser OVERTIME PAY = ZERO 


Figure 1: DOs and DON'Ts of Warnier-Orr diagramming. Figure 1a looks like actual program 
code and should not be used when trying to logically design a program. Figure 16 shows the 
correct method. The entire diagram contains only logical statements which could be coded 


into any computer language. 


designs shown in figure 1. Both figures 1a and 
1b are diagrams of the same process: compu- 
tation of overtime wages. The diagram in 
figure 1a however seems to be the type that 
veteran programmers will almost always try 
to draw. Note its heavy stress on the lan- 
guage aspect of the function. It almost looks 
like part of a BASIC program cut out and 
pasted on a diagram. Contrast that diagram 
with the one of figure 1b which correctly 
details the logical process being performed. 
You can see that if figure 1a was the only 
documentation for this particular procedure, 
you would probably not be able to tell what 
that piece of code was supposed to be doing. 
You might have some idea because this pro- 
gram seems to have semimeaningful field 
names from which you might deduce some 
purpose. All we can tell for sure from 
figure 1a is that some part of the program is 
going to crunch a couple of numbers. What 
numbers it is going to crunch and just what 
for are anyone’s guess. On the other hand, it 
is impossible to misunderstand what the 
process diagrammed in figure 1b is doing. It 
is very easy to read and comprehend because 
it shows the logical side of the procedure. 

This stress of the logical over the physical 
while designing with the Warnier-Orr dia- 
grams is essential to their correct usage. 
Designing as in figure 1a serves absolutely no 
purpose as far as understanding the process 
that is being described and is essentially 
worthless as far as documentation is con- 
cerned. Even though you might be able to 
tell what that diagram does the day you 
draw it, you probably won’t be able to 
understand it in six months. Someone else 
who wants to use your documentation might 
never understand it. 

As long as we’re on the subject of docu- 





Productivity 





Time 


Figure 2: Typical productivity curve of 
programmer being introduced to Warnier- 
Orr diagram methodology. 


mentation, | might mention that through the 
development period of this technique, many 
people were concerned that the diagrams 
might become too far removed from the 
actual code, which would render them use- 
less as effective documentation. They wor- 
tied that since the diagrams depicted the 
logical side of the problem, they had little or 
no relevance to the physical (real world) 
side. Those fears were easily put aside with 
two diagramming and coding conventions, 
as follows: 


@ Physical mileposts on the Warnier-Orr 
diagrams. 
@ Logical symbol tables in the programs. 


Thus, when we actually wrote code that 
looked like that of figure 1a, we would tie it 
to the logical figure 1b by adding the follow- 
ing to the diagram. 


:STMT# 1000 


COMPUTE OVERTIME PAY 


This would be included in the program itself 
mv using comment statements: 


1000 REM COMPUTE OVERTIME PAY 


1001 REM HFLD = HOURS WORKED 
1002 REM OVTFLD = OVERTIME PAY 


1003 REM SALFLD =SALARY 


This allows us to have a very clear and 
concise, one to one mapping between the 
logical diagram and the physical code. Refer- 
ences between the two diagrams are quite 
easy. If, for instance, you want to know 
what a particular section of code is supposed 
to be doing, you need only to look it up on 
the logical diagram. Similarly, if you want to 
find out which part of the program is 
carrying out a particular logical function, 
you have the location information at your 
fingertips. This is excellent documentation 
m the event that you or someone else might 
someday want to make a modification to 
your code. 

The third common philosophical error, 
that of skipping the design phase altogether, 
is a real problem to most newcomers. In 
fact, if you look at a typical productivity 
curve for a programmer who is introduced to 
the Warnier-Orr diagrams, it generally looks 
something like the curve in figure 2. 

A currently productive programmer pro- 
ducing work at a constant rate up until the 
time the Warnier-Orr techniques are intro- 
duced (point A), will typically show an ini- 
tial burst of very high productivity (point B). 
This is usually followed by a slump (point C) 
where the programmer sinks back to or just 
above his previous level of work. Eventually, 
he will climb back up to a new, higher level 
of work (point D), where he will usually stay. 
This peculiar slump at point C seems to be 
primarily due to the fact that since the pro- 
grammer has begun to feel comfortable with 
the new technique and has had some initial 
success with it, he begins to feel confident 
enough to try to do the work without doing 
the diagrams first. He soon realizes that the 
quality of his work has dropped off and 
starts to do the diagrams once again, this 
time for good, and his work level rises up to 
a new, higher level that will remain fairly 
constant. 

Apparently, the only way to get new 
people to avoid this temptation is to fore- 
warn them that it does tend to happen, so 
that if and when they find themselves on the 
downhill side of the productivity curve, they 
can recognize the trap in time to escape the 
worst of it. 


PRODUCT CODE=A 
(0,1) 


© 


DETERMINE UNIT | PRODUCT CODE =B 
PRICE (0,1) 


© 


PRODUCT CODE =C 


END PRICE 


UNIT PRICE IS IN 
PRICE FIELD #1 


GO TO END PRICE 


NIT PRICE = $7.00 
GO TO END PRICE 


O TO COMPUTE MARKE1 


Figure 3: Example case statements making use of logically illegal GOTO state- 
ments. When a set of statements is finished the diagram will logically fall 
through all of the other exclusive ORs, ®, and arrive at the END PRICE 


section. Thus no GOTO need be shown. 


So much for the philosophical errors. 
There are also a few common technical 
errors that people make, and we’ll look at 
those next. 


Technical Errors 


For a lot of people who are just starting 
to program and may be unfamiliar with 
structured programming techniques, some of 
the diagramming methods may seem to bea 
bit uncomfortable. One of the most often 
seen technical errors is the attempted use of 
a GOTO statement on the diagram. The case 
statement shown in figure 3 illustrates this 
problem. 

Two of the occurrences of the GOTOs in 
figure 3 are incorrect and the other is am- 
biguous. The GOTOs in ‘PRODUCT CODE = 
A” and in “PRODUCT CODE = B” are 
unnecessary and incorrect. The default 
logical linkages will see to it that the appro- 
priate steps are executed. The GOTO at 
“PRODUCT CODE = C”’ is unclear. If it is 
supposed to mean that we are to cease 
execution of this process and jump to the 
procedure “COMPUTE MARKET PRICE” 
to begin processing, then its usage is incor- 
rect. If on the other hand it means that 
“COMPUTE MARKET PRICE” is a com- 
mon utility routine and is described else- 
where in the system, then the GOTO is 
misleading. Instead, we should have written: 


1 


PRODUCT CODE =C qoute MARKET PRICE 


if the process was expanded on a different 
page of the diagram; or something like the 
words “.,.SEE ABOVE” or “...SEE 
BELOW” if that process appears elsewhere 
on the same page. The GOTO is a physical 
entity to be used at execution and is not a 
logical relationship, so it does not belong on 


... SEE PAGE #3, 
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Figure 4: Example of a case statement with 
processes that are mutually exclusive and 


mutually independent. 
MONDAY 
(0,1) 
TUESDAY 
(0,1) 
SELECT DAY OF 
THE WEEK ® 


FRIDAY 
(0,1) { 


© 


OTHER 
(0,1) 


Figure 5: When a case statement has mutually independent and mutually 
exclusive statements, the statements may be rearranged into any order with- 
out changing the logic of the diagram. 


MONDAY, TUESDAY OR FRIDAY { 
(0,1) 


© 


TUESDAY 
(0,1) 
SELECT DAY OF 


THE WEEK G) 
MONDAY { 

(0,1) 

® 


FRIDAY 
(0,1) 


Figure 6: Although this is a working Warnier-Orr diagram, the case statements 
are not mutually independent. 


PLAYER Has NO BODY {skip 


© 


PLAYER HAS NO NECK < skip 


© 


PLAYER HAS NO HEAD 


10.1) < skip 
©) 


PLAYER ALREADY HAS 
{skip 


ROLL IS A ''4” 


TWO ANTENNAE 
(0,1) 


© 


PLAYER ALREADY HAS fqive player 
TWO ANTENNAE { AN ANTENNA 
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IF ‘player has no body’ 
THEN... 

ELSE IF ‘player has no neck’ 
THEN... 

ELSE IF ‘player has no head’ 
THEN... 

ELSE IF ‘player has two antennae’ 
THEN... 

ELSE ‘give player one antenna’ 


Listing 1: Typical if-then-else structure for 
Warnier-Orr diagram of figure 6. 


a logical Warnier-Orr diagram. 

Another common technical mistake is 
one that is a little harder to catch, and is one 
that even professionals with this technique 
will make if they aren’t careful. Consider the 
case statement shown in figure 4. 

Note that in this case statement, not only 
are the processes outlined mutually exclusive 
(only one of the cases is true), but they are 
also mutually independent. That is, their 
order within the case statement does not 
matter. It would be just as correct for 
me to have written the diagram as shown in 
figure 5. 

In an earlier article “Structured Program 
Design” (Oct 77 BYTE)*, the game of BUG 
was outlined. In the game, a die is rolled for 
each player and each number of the die 
corresponds to a part of the bug’s body; the 
player finishing his bug first wins the game. 
If aplayer rolls a 4 for instance, he is entitled 
to one antenna. But he must have already 
acquired a body, a neck and a head in that 
order before he can receive an antenna. He 
needs a total of two antennae if he is to 
complete a bug. E 

Many people would try to code that proc- 
ess aS a Case Statement as in figure 6. The 
process in figure 6 certainly looks correct, 
and indeed, if you code it as a case state- 
ment, as in listing 1, it will even run correctly. 

However, this process is not a case state- 
ment. It is more properly called a pseudo- 
case statement, because each of its cases is 
mutually dependent. The cases cannot be 
reordered within the statement without 
destroying its logic. Notice that rearrange- 
ment of the case statement diagram as 
shown in figure 7 does not work at all. This 
arrangement will give the player an antenna 
anytime a four is rolled, until he has two 
antennae, regardless of whether or not he 
already has a body, a neck or a head. A more 
correct logical interpretation of the case 
structure we want is shown in figure 8. 

You might also notice that since the bug 
must have a body before it can have a neck 
(and a neck before it can have ahead) if we 
merely check for the presence of the head, 
we will be indirectly checking for the neck 
and the body, so that figure 9 is an equivalent 
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structure. 

Another common technical error is the 
misuse or lack of use of the (0,1) notation in 
conjunction with the exclusive OR, ®. Many 
umes, people will simply write: 


CONDITIONA { 


TEST ©) 


CONDITION B < 


By this they often imply the (0,1) notation 
with the use of the symbol ® alone. Actually, 
this is not incorrect; in fact, for most people 
familiar with the diagrams, this notation 
seems to be just as clear. But for users not 
quite familiar with the Warnier-Orr diagrams 
it is probably best to go ahead and include 
the (0,1). 

To conclude, I’ll reiterate a point made in 
an earlier article: Understanding a Warnier- 
Orr diagram is very easy; creating one from 
scratch is much harder than it looks.@ 


PLAYER HAS 2 ANTENNAE GIVE PLAYER 
(0,1) AN ANTENNA 


© 


PLAYER Has NO HEAD { skip 


@ 


ROLLISA "4 PLAYER HAS NO BODY {skip 


© 


PLAYER a NO NECK € skip 


® 


PLAYER HAS 2 ANTENNAE {skie 


Figure 7: When the statements in figure 6 are rearranged as shown, it can be 
seen that the program fails to work as desired. 





Figure 8: This method of 
approaching the stated 
“bug’’ problem is more 
logically correct than that 
of figure 6. All of the 
statements at each level of 
the diagram are mutually 
exclusive and mutually 
independent. PLAYER 


ROLL ISA‘4” 


PLAYER 


PLAYER HAS 2 
ANTENNAE { sxe 
(0,1) 


PLAYER HAS A HEAD @) 


PLAYER HAS A NECK 


Oo Pe teNNac. ¢ GIVE PLAYER 
HAS A BODY (0,1) AN ANTENNAE 
(0,1) 
© PLAYER HAS A HEAD 
(0.1) { SKIP 
©) PLAY ERIHAS®HECK { SKIP 


HAS A BODY 
(0.1) { SKIP 


PLAYER HAS 2 
PEA En aay READ ANTENNAE < skir 


(0,1) # 


® 


ROLL IS A “4” G) PLAYER HAS? f GIVEPLAYER 
ANTENNAE AN 
(0,1) ANTENNAE 


PLAYER HAS A HEAD {skip 


’ 


Figure 9: Since a bug must have a head in order to have an antennae, and a body and neck to 
have a head, the search process can be shortened by just checking for the presence of a head. 


33 


34 


Top-Down 
Modular 


Programming 


Albert D Hearn 


tf you have done some programming, 
you know that it’s one of the most en- 
joyable and satisfying parts of personal 
computer use. The very thought that the 
vast power in the small system’s processor 
is limited only by the program that you 
write for it is tremendously exciting. 

If you are new to the computer game, 
the programs you have written up to now 
have probably been relatively small and 
uncomplicated, but you have developed 
a lot of experience and confidence from 
them. Most likely you haven’t used any 
particular technique in designing and writ- 
ing your programs: you have probably 
approached program design in an informal 
way and relied upon your good senses to 
guide you in this unfamiliar task. You have 
probably also gained an understanding of 
the full capabilities of the instruction set 
and some of the little tricks (yes, ADDing 
a binary number to itself really does result 
in a left shift of one bit) which can be so 
useful. You are also capable of writing |O 
routines to do about any kind of data 
transfer you want. 

So now you are ready to do a program 
which does something really useful. The 
program you have in mind is going to be 
larger and more complicated than those 
you have done previously. While you might 
not expect this, your previous informal 
methods of designing and coding might 
possibly be inadequate and could cause 
you much grief if you attempt to use them 
on a larger program. 

Hopefully, | can help you prevent these 
kinds of difficulties by showing you in this 
article an easy to use method of designing 
and structuring larger programs which can 
greatly simplify your personal efforts, 
regardless of complexity. 


The Concept 


Someone once said, ‘‘To solve a complex 
problem, simply break it down into a num- 
ber of less complex pieces, then proceed to 
solve it one piece at a time.” This approach 
has been used for many years in the design 
and building of electronic equipment. It 
results in a ‘“‘building block,” or ‘‘modular’”’ 
construction, where each block or module 
does some distinct part of the total function 
of the equipment. For instance, think of the 
last time you saw a diagram of a radio re- 
ceiver. It was probably in the form of a set 
of separate blocks representing the RF 
amplifier, mixer, IF amplifier, and so on. 
The blocks were all connected with flow 
lines showing the sequence in which each 
equipment module processed a signal coming 
from the antenna. The diagram enabled the 
reader to understand the function of the 
radio One module at a time, in relation to 
the whole radio. 

So how does the idea of using building 
blocks and solving problems piecemeal 
relate to the programming of. personal 
computers? The answer is that these same 
ideas are very applicable to programming 
and have been in use in commercial pro- 
gramming for a number of years. There is 
no reason that good use of them can’t be 
made in the amateur computer hobby also. 


Top-down Design 


Top-down design of microprocessor pro- 
grams requires that you first have a clear 
notion about what it is that you want the 
program to do. You should ask yourself 
questions like, ‘“‘What function do ! want 
performed?”, “What input information is 
available?”, and “What output information 


level | (highest) 


Figure 1: A_ basic top- 
down design diagram is a 
structure like this. The 
number of levels may vary, 
and the number of boxes 
may vary, but the basic 
idea is given by this 


level 2 


level 3 (lowest) 





prototype. 
checkbook imptied implied comparison of checkbook 
bank stmt inputs balance outputs and bank balances 
———_» 
deposit slips checkbook errors 
checks corrections 
Figure 2: The first level of 
osu ee heen eee ee pa 7 design is the act of saying 
SeSTeTR ans, GSS hSsssy  cestrs rari “{ want a program to do 


peeewee 


or action do | expect?’”? When you can 
answer these questions, you’ve actually 
completed the highest level of design. 

The basic principle of top-down design 
procedure says that you start at a very high 
level of function definition and then pro- 
gressively expand that function into more 
and more detail until you’re at a low enough 
level to begin coding your program. 
Actually, this is a very natural way to 
design solutions to any problem, but, for 
some reason, this method was very slowly 
applied to programming. The top-down 
method is different from bottom-up, where 
the concern is for coding and details before 
a real program design has been done. Bot- 
tom-up methods work on the ‘‘how”’ aspects 
of the program before the ‘‘what’’ aspects. 
An analogy of this method would be the 
building of a house, using no structural 
plans, by first laying down a convenient 
foundation and then gradually adding 
wood and stone until some desirable struc- 
ture has evolved. 

Let’s take an example of a function 
that could be performed on a microproc- 
essor system for the purpose of illustrating 
the technique of modular, top-down pro- 
gram design. The function, monthly check- 
book balancing, was selected because it is 
a process that is familiar to most of us and 
it contains all of the elements which make it 
a good example. 

In order to design what you want the 
program to do, begin by drawing a multi- 
level design diagram like the one shown 





in figure 1, The diagram will describe what 
the program does at a number of different 
levels of detail, starting with the highest 
level which is a single block describing the 
overall function. The next lower level of 
blocks breaks the higher level function into 
a number of more detailed subfunctions. 
The next level takes those blocks and breaks 
them into even greater detail, and so on. 
An important point to remember is that the 
total function of the program is represented 
at each level. 

Figure 2 illustrates the first steps in the 
top-down design of your checkbook balanc- 
ing program. The first block simply states 
that the program will balance your check- 
book. There are no details in that block and 
it certainly doesn’t invite coding at this 
point in the design. For input, you know 
that you will have your checkbook entries, 
monthly statement from the bank, deposit 
slips and cancelled checks. The output you 
want is a comparison of your checkbook 
balance (adjusted for recent deposits, ser- 
vice charges and outstanding checks) and the 
balance shown on the bank statement. You 
also want to know where any errors were 
made and what corrections are required. 

The second level of design, shown in 
figure 3, breaks the first level block into 
three major subfunctions, Although this sub- 
division could have been done differently in 
terms of the content of the second level 
blocks, the sum total of those functions 
always adds up to the entire function of the 
program. The idea is that you start the 


thus and so.” Here “thus 
and so” is defined to mean 
checkbook balancing. 
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Figure 3: Once the first 
level of design has been 
determined, the next level 
is specified by breaking up 
the task into parts which 
gre fundamentally inde- 
pendent of one another. 
Here, checkbook balancing 
is viewed as three separate 
modules of function. 













balance 
deposits 


balance 
checkbook 







balance 
checks and 
charges 








compore bank 
and checkbook 
balances 





match deposits 
in checkbook 
against bank 
statement 


adjust bank 
balance for 
any late 
deposits 


match cancelled 
checks to check-| 
book entries and 
bank statement 


adjust check- 
book balance 
for any bank 
charges 


adjust bank 
balance for 
any outstand- 
ing checks 


compare bank 
balance to 
checkbook 
balance 


determine any 
differences and 
correct 
mistokes 





Figure 5: After the modu- 
lar structure of the ap- 
plication is determined in 
a hierarchy such as those 
exemplified in figures 1 to 
4, then attention can be 
given to sequencing of 
functions. This flowchart 
shows general level se- 
quencing of the checkbook 
balancing application. 
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match cancelled 
checks to check 
book entries and 
bank statement 








balance 
checkbook 


balance 
checks and 
charges 


adjust check- 
book balance 
for any bank 
charges 















adjust bank 
balance for 
any outstand- 
ing checks 


Figure 4: Carrying the process one step further, the next level is shown here for one of the 


branches of the structure of the programs. 





process slowly and don’t attempt to develop 
too much detail too soon. Keep the number 
of subfunctions small, five or fewer, under 
each function block. Don’t worry about the 
order in which these subfunctions will be 
performed in your program. Remember, 
you’re only concerned at this point about 
what is to be done, not how it is to be done. 

Next, take the design to the next lower 
level by further subdividing each of the 
second level blocks. Figure 4 illustrates a 
portion of this step. Just make sure that 
each subblock represents a complete sub- 
function and that the subfunctions at any 
level are equivalent to the program function. 

You might ask at this point, “How many 
levels must | go through?”, or “How do | 
know when to stop?” There is no precise 
answer to these questions, although the fol- 
lowing guidelines should help. In general, 
you will find that you should stop when the 
lowest level of functions is so simple that 
you can easily write a program module to 
do each one. A module should be considered 
to have about 50 program instructions, or 
less. Experience will help you to know when 
you have reached this point. Also, you will 
find that the more complex the program, 
the more design levels you will need; general- 


ly, about three or four levels will be 
sufficient. 

Another method of determining if you've 
carried the design to a low enough level 
comes about almost automatically. If you 
are attempting to complete one of the lower 
levels and you find that the order of sub- 
function execution is becoming difficult to 
ignore, then you've probably gone far 
enough. Also, if you find that it is becoming 
necessary to show that program branching 
or decision making is required (top-down de- 
sign diagrams should show no decision logic), 
then you know that you have about the 
right level of design. You are now ready to 
Start thinking about the how of your 
program. 


Modular Construction. 


If you try to make each block at the 
lowest level of your design diagram into a 
module, you might determine that some 
blocks are simple and can be combined 
into fewer modules. On the other hand, 
there will probably be blocks which would 
result in modules larger than the minimum 
size of 50 instructions we have established. 
In this case, take the blocks through one or 


more additional levels of design. 

Now decide what sequence the functions 
should be performed in. Begin drawing a 
flowchart showing the required sequence. 
Will each function be performed for each 
pass through the program? If not, add deci- 
sion blocks showing the conditions under 
which each such function is executed. Also 
add any function blocks which may be 
necessary to initialize data, clear tables, 
1O data, etc. 

Figure 5 shows a sequence of functions 
which results from the design of your exam- 
ple checkbook balancing program. Actually, 
the functions shown are probably too high 
level for this step, but for the sake of illustra- 
tion, the diagram should make the point. 

At this time, { would recommend that 
you consider making use of a special pro- 
gram structure called an executive routine, 
which offers some significant advantages. 
The executive is the main routine in the 
program and primarily contains calls to 
the function modules which do all the 
processing duties. It makes all decisions 
about the sequence of execution. It also 
contains the starting and ending points of 
the program. The objective of the executive 
is to concentrate most of the decision logic 
and common function of the program into 
a Separate routine which becomes another 
program module. 

In this way, the function modules need 
not, and should not, make sequencing deci- 
sions. They should never directly pass con- 
trol to another function module. This should 
be done only through the executive. A func- 
tion module’s only responsibility is to be 
given control by the executive, do its 
assigned job, and then return control back to 
the executive. Function modules are written 
in the form of subroutines using the call and 
return facilities of the programming language 
being used. They should also contain a 
generous sprinkling of comment statements 
to insure a high degree of understandability, 
as well as a well-defined |O interface to the 
outside world and the rest of the program. 

Figure 6 illustrates the final step in the 
modular, top-down design of your check- 
book balancing program. You have added an 
executive routine and some necessary house- 
keeping routines. You could begin coding 
the program from this flowchart by first 
writing the executive and the associated 
subroutine calls for each of the processing 
modules. By writing dummy subroutines 
which simply return control when they are 
called, you can test your executive for cor- 
rect operation without the need for the real 
processing modules. 

The next step, of course, is writing the 
processing routines. This is simplified by the 
design approach described in this article 
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Figure 6: While the sequencing of the diagram shown in figure 5 is adequate, 
it is often useful to explicitly partition all sequencing of execution in a 
separate module called the “executive” for the application. This flowchart 
shows a simple example of such an executive program which sequences the 


major operations of the application. 





because it allows you to work on each routine 
as a separate unit which can be written and 
tested independently of all other routines in 
the program. When all routines are com- 
pleted, they simply plug into the executive 
to form a total program. Later, if you want 
to change the sequence of execution, add or 
delete functions, it can be simply a matter of 
manipulating modular routines. @ 
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Some Words About Program Structure 


Microprocessor programming, at this 
point in time, is a black art. Once you have 
learned the basic instruction set, you’re on 
your own. Some people get the knack of 
this mysterious task fairly quickly, and 
some do not. Those who do well seem to 
have developed some sort of system for 
going about it. The point is that an or- 
ganized, systematic approach is required 
if there is any hope for continued program- 
ming success. The purpose of this article 
is to describe to you one such method 
which has become very popular with pro- 
grammers of all types, using all kinds of 
computers from micros to the giants. 


Concept 


What we’re looking for is simplicity in 
the writing of programs. This is usually 
achieved if the program can be reduced 
to a collection of basic components which 
fit together in very well-defined ways. This 
is the concept behind structured program- 
ming. 

Any program can be considered to have 
only two basic building blocks. One is the 
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process block shown in figure 1. It simply 
performs some defined function, or proc- 
ess. It might represent a simple function 
requiring only a few, maybe only one, in- 
structions in the program, or a much larger 
function requiring many instructions. What- 
ever it does, it has one input and one output. 

The second basic block is the decision 
block shown in figure 2. This elementary 
capability of any computer is that which 
gives it all its power and flexibility. {t is 
the ability to alter the path taken by the 
program based upon the value of some 
parameter or condition which can be tested 
by certain instruction types. For example, 
two numbers can be compared and a test for 
equality used to decide which of two pro- 
gram paths will be taken as a result. 

These two fundamental building blocks 
will now be used in the construction of a 
set of basic program structures with which 
any other program can be built. The three 
general structures are called sequence, if- 
then-else and loop. Variations of these will 
be examined, as well as combinations which 
can be used to build more complex functions. 





Figure 1: The process block is the “black box” of programming: It is entered 
by a single input path, does some arbitrary operations upon data, and is 
exited by a single output path. The “arbitrary operations” can be as simple 
as one step in an arithmetic calculation, or as complex as a compilation of a 


FING CE ahs ASSUMED FLOW 
NEU OF EXECUTION 











program — it all depends on the point of view taken. SINGLE le 
PATH “V 
ASSUMED FLOW 
SINGLE OF EXECUTION 
INPUT --- 


Figure 2: The decision block ts a simpler concept than the process block, in 
the sense that the amount of computation required rarely approaches the 
generality of an “arbitrary process.’’ A decision black has one input and, 
depending upon a binary condition, takes one of two output paths. In this 
figure, the names “true,” “then” and “yes” denote one possible path; the 
names “‘false,”’ “else” and “no” describe the other possible path. In pro- 
gramming languages, the “then” or “else’’ terminology for the two paths 
Is frequently built into the language design; the other terms are frequently 
seen in flowchart representations of programs. 
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Figure 3: The sequence structure is the simplest programming structure. 


Default flow: Vertical flow is It can be viewed from the outside as the equivalent of a process block, but 
from the top of a diagram toward upon close examination it is found to contain one or more process blocks. 
the bottom, and horizontal flow 
is from the left of a diagram to- 
wards the right, unless explicit 
flow is used. Thus: 
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SHOWN BY ARROWS The simplest of the program structures, 
shown in figure 3, is the sequence structure, 
which is composed of one or more process 
blocks strung together serially. Like the 
process block from which it is built, the 
sequence structure has only one input path 
and one output path. In fact, you will soon 
see that one of the rules that we want all 
structures to conform to is that they have a 
single input path and output path. Further- 
More, an entire program, which can be rep- 
resented by one large process block, should 
also conform to this rule. 

The next structure is the /f-then-else 
structure, shown in figure 4. It consists of 
Canoe Ai one a decision block and two process blocks. 
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Figure 4: The if-then-else structure is composed of a decision block and two 
process blocks. The process blocks may themselves be viewed as any form of 
structure with a single input and a single output path, and thus might In fact 
be sequence structures, if-then-else structures, etc. 
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sented by the decision block determines 
which process block is chosen. Notice that 
regardless of which path is taken there is 
one common exit path from the /f-then-else 
structure. This is required to maintain our 
single exit philosophy. 

An if-then-else structure does exactly 
what it says: /f a condition is true, then 
take a specified action, e/se take a specified 
alternate action. However, there are times 
when only one action is required in only 
one of the paths. No action is necessary 
in the other path. In an actual flow dia- 
gram, this is of course shown by drawing 
a flow line in place of one or the other 
process block of the /f-then-else structure 
since the most trivial process is simply 
going to the next process without doing 
anything. Note however that only one of the 
process blocks can be made up of this 
simplest case of ‘do nothing” since if both 
process blocks were eliminated from the 
if-then-else structure, the net effect would 
be to “do nothing” all the time whether 
or not the condition was true or false. 

The /f part of an if-then-else structure is 
simply any program instruction which can 
perform a test and take one of two paths 
depending upon the outcome. In an as- 
sembly language, this is usually a condi- 
tional jump or a branch instruction based 
upon the outcome of some comparison, 
arithmetic operation or other operation 
which affects processor status flags used in 
such branches and jumps. The branching 
instruction specifies the destination address 
of the beginning of one path, whether it is 
the then or the e/se leg is arbitrarily defined, 
and the next sequential instruction is as- 
sumed to begin the opposite path. 

Some higher level languages like BASIC 
have ready-made /f-then-e/se instructions. 
BASIC has IF and THEN; ELSE is implied. 
The following shows how an /f-then-else 
would look in BASIC: 


1 IF X=Y¥ THEN 10 


\ FALSE PART 


; TRUE PART 


In this example, the e/se code immediately 
follows the IF instruction. The GOTO 15 
ends the e/se path and causes the program 
to branch to the common exit point at 
tine 15. The then path starts at line 10 and 
ends at line 15. [BAS/C /s considered to be 
an “unstructured” language because of 


the need for an explicit GOTO following 
the “false part’ of an IF-THEN-ELSE 
construction. | 

If you use assembly language in your 
programming, and your assembler has a 
macroinstruction capability, then you can 
write your own /f-then-else macros. It is 
beyond the scope of this article to describe 
how this is done, but it isn’t very difficult. 

If you use assembly language and don’t 
have facilities for writing macros, then you 
can simulate the function of the macro- 
assembler in order to gain the advantages 
of structured programming. Simply sit 
down and write yourself a set of standard 
if-then-else structures. Take the five or 
six most common decision types (equal, 
not equal, zero, greater than, etc) and write 
skeleton programs for each. Leave blanks 
for the actual condition to be tested, and 
leave space for the actual code which will 
perform the then and else functions. Later, 
when you need an /f-then-else while writing 
a program, you can draw upon your set of 
prewritten structures. Not only does this 
eliminate your having to invent similar pro- 
gram sequences over and over again, but it 
also prevents many bugs and greatly eases 
the effort you have to put into program 
writing. 

The last basic structure is the /oop, 
which provides a means of repeating a se- 
quence of instructions until some stop 
condition is found to exist. There are two 
kinds of loop structures: do-unti! and 
do-while. 

A do-until structure, shown in figure 5, 
performs the function in the process block 
at least once. After that, a test is done to 
determine if the condition for stopping the 
process looping has been found true. As 
long as the condition is not true, the loop- 
ing continues. When it becomes true the 
looping ends and the exit path is taken. 
This type of structure can be used, for 
example, when you need to search a table of 
values, looking for a particular value. If you 
know that the table will always contain a 
matching entry, the program routine need 
not be more complicated by logic to detect 
end-of-table before a matching value is 
found. Notice that the first table entry is 
always examined before the decision is made 
to continue (this is because the ending 
condition decision is based upon the value 
of that entry). 

The second type of loop is the do-while, 
shown in figure 6. The difference between 
this and the do-unti/ structure is that the 
test is done before the process block is 
executed. In many cases there is not a lot 
of significance to this difference because 
both types of structures can do the same 


jobs. 

In specific situations you will find that 
one form will usually be better suited or 
more convenient than the other. The pri- 
mary difference to remember is that the 
do-until form always executes the process 
block at least once whether or not the 
untif condition is true, and that the do- 
while may not execute the process at all if 
the while condition is false at the time of 
the first test. Experience will best teach 
you which to use in the various situations. 

A variation of the loop structures of 
either form might be considered, the endless 
loop or do-forever. This form of loop occurs 
when the whi/e or until condition is never 
changed to allow execution of the output 
path of the structure. Intentional endless 
loops are occasionally used, as in the low 
level programming trick of hanging up 
execution in a tight loop to flag errors, 
or the quite legitimate endless loops which 
form the outer level of control! of a typical 
executive or monitor program. But for most 
programming purposes, an endless loop is a 
bug or error in the program. 


An Example 


Now using the basic structures, we can 
construct a program of any size and com- 
plexity by combining and nesting in any 
manner as long as some fundamental rules 
are adhered to: 


@ The program as a process should have 
only one input path and one output 
path. 

@ Structures within the program can 
be nested but each structure must be 
totally contained within the structure 
in which it is being nested (this will be 
illustrated later). 

@ There should be no branching unless 
it is part of a structure (for example, 
the GOTOs required in languages like 
BASIC). 

@ Refrain from attempting to optimize 
the program by violating the above 
rules. There is a right time for this 
later. 


Before we look at an example of struc- 
turing a program, let’s first look at how 
nesting of basic structures works. Figure 7 
shows a flowchart of a program which, 
overall, could be represented by a single 
/f-then-else, But when it is looked at in more 
detail, the e/se leg contains another /f-then- 
else as part of the instruction sequence 
there; the e/se leg of that structure con- 
tains yet another /f-then-else. The heavy 
outlines show that each of the nested struc- 
tures are totally enclosed by their parent 
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Figure 5: The do-until structure is a looping form whose purpose /s to exe- 
cute a given process block at least once. After executing the process block, 
the “until condition” is tested and if found to be false, execution loops back 
to repeat the process block before testing the condition again. 





structures; there is no overlap. A BASIC- 
like program to perform the function shown 
in figure 7 appears as listing 1. Again, | 
use outlines to illustrate that each structure 
is embedded in its entirety within another 
higher level structure. Notice that | have 
used indentation of lines to increase the 
readability of the program. Each separate 
structure should be at a different level of 
indentation than its parent. 
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Figure 6: The do-while structure is a looping form whose purpose is to exe- 
cute a given process block only If the “while condition” is true. Thus it can 
execute the process block zero times If the condition is false initially, or an 
arbitrary number of times so long as the condition remains true during re- 
peated execution of the process block. 
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Listing 1: A BASIC-like 
program equivalent for the 
flowchart of figure 7. The 
lines in the picture em- 
phasize the structured pro- 
gramming formalism. 







SINGLE 
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PROCESS Q 


PROCESS R 


Figure 7: The various types of structures can be nested by noting that any 
place where a process block is indicated, a more complex structure can be 
used since it, too, only has one input path and one output path of execution. 
Thus, for example, this flowchart shows nesting of a process Q block and an 
if-then-else structure as the else part of the if-then-else structure with condi- 
tion X=Y?. This second if-then-else in turn has a third if-then-else as part of 
its else part. The outlines show the nesting of one structure within another. 
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IF X=Y THEN 32 
process Q 


IF A>B THEN 31 
process R 


IF C<D THEN 30 
process S 
CONTINUE 
CONTINUE 


Figure 8: An unstructured 
flowchart performs an end- 
less process as might be 
implemented in an auto- 
mobile interlock. This is a 
complete and viable solu- 
tion of the problem, but it 
Involves numerous branch- 
ing operations performed 
In an uncontrolled (GOTO) 
fashion. 
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Figure 9: Taking the algorithm of figure 8 and casting It into a standardized, structured pro- 
gramming form eliminates all GOTO operations in lanquages with a complete if-then-else 
structure, and in languages like BASIC, reduces use of GOTO operations to standardized struc- 
tures. In this flowchart, we've positioned all the blocks to emphasize the nesting of structure. 
One of the primary reasons for the emphasis on structured programming is one of communica- 
tions of ideas to other programmers (or the originating programmer at a later date). The claim 
Is made that a flowchart like this one, and its equivalent representation in listing 2, provide a 
standardized way of communicating algorithms which makes the listing or chart easier to under- 
stand and read. 
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Listing 2: A BASIC-like 
application program for 
activating a buzzer of an 
automobile given several 
conditions. A subroutine 
BUZZ is indicated (by a 
call with the keyword 
GOSUB) to actually sound 
a noise during the foop. 
In this BASIC-like repre- 
sentation, several liberties 
with syntax have been 
taken. 


Let’s look now at an example of a simple 
program and show how a structured version 
might differ from an unstructured version. 

The program is one which might be part 
of a future automobile computer control 
system using a microprocessor. Its purpose 
is to trigger a buzzer if the ignition key is 
left in the lock when the left front door is 
opened, or if the headlights are left on 
when the key is not in the lock. A delay is 
performed before conditions are checked 
again. 

The flowchart in figure 8 shows how we 
might have drawn it without attempting to 
apply any of the principles of structured 
programming. Now, look at figure 9 which 
shows the structured version. Both forms 
of the program do the same function, but 
the structured form is clearly more straight- 
forward and easier to write code from. 

Basically, a number of things happened 
to the flowchart when it was structured. 
First, all the branches (or GOTOs) became 
forward branches except those in loop 
structures. This allows for reading the chart 
from top to bottom in an orderly way. 
Secondly, each decision block and process 
block has been put into a proper structure 
and nested totally within its parent struc- 
ture. Thirdly, every structure regardless 
of its place in the overall program has only 
one input and one output. 

One thing has happened that might ap- 
pear to be a little strange to you. The se- 
quence structure which performs the buzzer 
function appears twice now, where it only 
appeared once before. This is necessary in 
order to keep the structure clean. Remem- 


IF KEY #ON THEN 7 


LET X=X+1 

IF X=5000 THEN 6 
GOTO 3 

GOTO 13 


IF KEY=INLOCK THEN 11 


IF LIGHT #ON THEN 10 
GOSUB BUZZ 
GOTO 13 


IF DOOR #OPEN THEN 13 
GOSUB BUZZ 
CONTINUE 





ber, you cannot simply branch into the 
other buzzer block because those two 
structures would then overlap. The inef- 
ficiency implied by the double appearance 
of that block might bother you, but it will 
probably turn out that the block will be 
written as a subroutine and the only in- 
efficiency will be an extra call instruction. 

Listing 2 is a BASIC-like program for the 
structured flowchart. (Here ‘“BASIC-like” 
means using the syntax of BASIC but 
allowing variable names to be many char- 
acters in length for purposes of illustrating 
their meaning.) | have not attempted to 
make the program complete and have taken 
some liberties in order to illustrate my 
points. 

A few words of explanation are in order. 
First, the instructions at lines 3, 4 and 5 
represent a do-unti/ structure which is used 
to implement a delay by simply increment- 
ing a counter (X) until it reaches a large 
value. The name BUZZ represents the line 
number of a subroutine (not shown) which 
activates an electronic buzzer in the car's 
dash. 

Now is the time to go back and look at 
the program to make it more efficient in 
its operation or in the amount of memory 
required. This should be done only if it is 
absolutely necessary. If it is necessary, try 
to maintain the structuring to the extent 
that it doesn’t destroy the clarity of the 
program or increase its complexity. In our 
example program, notice that there are 
three CONTINUE instructions at lines 
13, 14 and 15 leading to a GOTO at line 
16. The speed of the routine can be im- 
proved and the memory requirements can 
be reduced by eliminating the CONTINUEs 
and changing any instruction which refer- 
ences any of them to go to line 16. Alter- 
natively, you could change each of those 
references to go directly to line 1 although 
you would be seriously interfering with the 
intent of structuring. _ 

In conclusion, | invite you to try the 
techniques described in this article when you 
write your next program. If you have done 
it any other way before, it takes a little 
getting used to, but | think you will ulti- 
mately agree that it has a Jot to offer. 
Hopefully, you will see the benefits in the 
form of less time spent getting your pro- 
gram designed, written and debugged. In 
short, | believe that it can help make pro- 
gramming even more enjoyable.@ 


Applied Structured Programming 


.. and How to Use It: Part 1 


Regardless of whether you’re a newcomer 
to computers or a devoted computer en- 
thusiast who occasionally manages to dream 
in hexadecimal, one thing is true: there’s 
always room for improvement in your 
programming. Unless you are exceptionally 
orderly, you probably dive right into flow- 
charting or coding a problem, erase and doa 
lot of filling in when you remember some- 
thing you hadn’t thought of earlier, wind up 
with a final program where insertions choke 
your original code like weeds in a garden, 
and spend too much time correcting mis- 
takes you think you shouldn’t have made. 
And when you finish the program that (you 
think) is finally running, you complain to 
yourself, “‘There has got to be a better 
way!” 

There is, and it’s called structured pro- 
gramming. No, it isn’t a language, and it 
isn’t just a way to write a program. It’s a 
philosophy of design that pays close atten- 
tion to how a program comes to be written 
and tries to suggest ways to do each step 
more efficiently. 

Structured programming evolved less than 
ten years ago, when computer programmers 
finally faced programs too big to under- 
stand, when the cost of writing and de- 
bugging software began to exceed the ex- 
pense of using extra hardware and computer 
time. The school of structured programming 
evolved after E W Dijkstra voiced several 
thought provoking opinions, one of which 
was that many of our programming prob- 
lems are caused by (over)use of the un- 
restricted GOTO statement, which is present 
in every high leve! language from COBOL to 
FORTRAN. And many people nodded their 
heads enthusiastically, for who hasn’t traced 
the bug in a program to an unexpected 
juxtaposition of GOTOs? 

Structured programming seems to help 
the habitual problems of even the most 
conscientious programmer, problems like 
how to write and debug a large program, 
how to fix (or better, to keep from hap- 


Gregg Williams 


pening in the first place) bugs in programs 
that crash after working correctly for 
months or years, or how to add to an 
already working program without causing 
unexpected side effects. But how do you do 
structured programming in a language that 
permits unlimited GOTOs? 

Specifically, how do you do structured 
programming in BASIC, which is the uni- 
versal language for the microcomputer 
user? Simple — you use GOTQOs to imple- 
ment the three basic structured program- 
ming structures that can theoretically repre- 
sent any problem — sequence, do. . .while 
and jf...then...else — and use GOTOs 
for nothing else. 

What’s the catch? You have to make 
yourself do it. The trade-off is simple: 
some discipline, a bit more planning in the 
early stages of designing a program, maybe 
a few extra lines of code in exchange for less 
total time spent in programming and getting 
a new program to work, less time spent de- 
bugging, and less chance of unexpected 
“blowups” happening later. It does seem to 
take more time, but that’s because your 
lazy brain is protesting the exercise of little 
gray cells in thinking out a program and 
applying discipline; but the total time can 
be less, and the total frustration less. 

I've been doing structured programming 
for some time as | write this (I got into it 
largely due to the complexity of the pro- 
gramming job | have at work), and now | 
wonder why anyone gets let out of program- 
ming classes without learning structured 
Programming as the Gospel according to 
Dijkstra. Still, I've found several places 
where by-the-book structured programming 
is a bit awkward; and since | do have GOTOs 
to work with in BASIC, | found it hard to 
justify not using them when they can be 
used to simplify a program while still re- 
taining the properties of “straight”? struc- 
tured programs. 

This article, then, will deal with using 
structured programming in BASIC (with 
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era X3>5 X3 <5 
X=S5e0rY <0 X#5andY>0 
R > ($ +3) R< (S +3) 
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K2 > K3 and K2 > K4 K2 < K3 or K2 < K4 


Figure 1: Generating the inverse of a conditional expression. When converting structured 
pseudocode to BASIC code, the logical inverse of a conditional expression must often be in- 
serted in an IF statement. At (a) is a table where euch line contains two symbols, each of which 
is the inverse of the other. The rule for creating the inverse of ua given expression is to replace 
every occurrence of the eight symbols in (a) with its inverse; (b) shows some examples of this. 


emphasis on the word using) and with some 
extra programming structures | have found 
useful. 


Some Preliminaries 


Before | begin, | need to get two points 
out of the way. The first has to do with a 
Property of the three basic control struc- 
tures that must be duplicated by any pro- 
posed control structure for the latter to be 
suitable for a structured program. (I’m 
pointing this out to justify my additions 
and modifications to the three basic control 
structures.) In a word, each of the three 
Structures in strict structured programming 
has the property called ‘‘one-in, one-out.” 
This means that every time control of the 
program passes through this block (1 will be 
calling the code between the boundaries of 
a control structure a “‘block’’), it always 
starts with the same (first) statement and 
always exits through the same (last) state- 
ment ~ in other words, only one way in, 
only one way out. This allows a program to 
be constructed like a series of beads on a 
string, each of which can be examined and 
changed without inadvertently changing any 
of the others. (This is another property of 
structured programming control structures, 
the functional independence of each 
module. Given the same input, a module 
should perform the same operations regard- 
less of what has happened in previous 
blocks.) 

The second point is simply one of defini- 
tion. In structured programming, we have 
situations where a block of code is done 
when a certain relationship holds true; 
if it is false, we do not do that block of 
code. These relationships, called conditional 
or relational expressions, are true or false 
depending on the current values of variables 
contained in the expression; examples are 
X1 = Y1, B>3, D+K1 # K2. In structured 


programming, we do a block of code when a 
certain condition is met; to express this in 
BASIC, we must use the IF statement to 
branch around the same code when the 
condition is not met, that is, when the logi- 
cal opposite or inverse of the same ex- 
pression is true. 

Several examples will help you here. 
When is X>5 false (for what values of X)? 
When is K1 # 3 false? When is C2100 
false? The answers are respectively when 
X<5, K1 = 3, and C<100. Why? Because the 
opposite of “greater than” (as in X>5) is 
“less than or equal to’’; the opposite of ‘‘not 
equal to” (as in K1 # 3) is “equal to”; and 
the opposite of ‘greater than or equal to” 
(as in C2100) is “less than.’ And the con- 
verse is true as well, that is, the opposite of 
“less than or equal to” is “‘greater than,” 
and so on. This also works with interchang- 
ing the logical connectives AND and OR (for 
example, the opposite of ‘“G>5 AND 
Al = 0” is “G<5 OR Al #0”). The justi- 
fication for this line of reasoning can be 
found in any book on elementary logic (see 
DeMorgan’s Law or DeMorgan’s Theorem 
as it is variously known). 

Because of all this, a simple table (see 
figure 1) allows us to find the inverse of a 
conditional expression. The rule to apply 
is: for a given expression, replace every 
occurrence of the symbols <, >, <, 2, 
=, #, AND and OR, with the other symbol 
in the same row; the new expression is now 
the inverse of the ffirst conditional 
expression. 


Putting It in BASIC 


Once we have the three basic control 
structures of sequence, /f...then.. .else 
and do...while, we can look back to the 
moment before their invention and say, 
yes, because we are time bound creatures 
tied to the idea of serial or time ordered 


cause and effect, how else could we do any- 
thing? One either does tasks in sequence, 
or does a task until it no longer needs doing, 
or does one thing if something is true and 
another thing if it is not. How else can you 
decide on how to do a thing? (Unfor- 
tunately, when people stopped doing things 
by hand and programmed a computer to 
do them “‘for them,” this intuitive causality 
was sometimes left behind. It’s fitting that 
we returned to this intuitive causality only 
when the problem of writing computer 
programs was itself attacked as a problem.) 

The three basic control structures written 
in convenient pseudocode are listed with 
their flowchart equivalents in figure 2. Note 
that, in an /f...then...else sequence, in-no 
way can both blocks 1 and 2 be done during 
the same pass, and that the decision whether 
to do a block 1 through n times in the 
do.. .while is made at the beginning of the 
block so that it is possible for a do. . .while 
block to do the enclosed blocks zero times. 

Given these three control structures, 
every problem must be broken into varying 
levels of subproblems, each of which can 
ultimately be expressed as a combination of 
straight sequence, /f. . .then. . .else sequences 
and do...while loops. How would you 
initially break these problems down using 
the above control structures? 


1. Given a number N, print the num- 
ber, its reciprocal, and -1 times 
the number; 

2. Average five test scores A, B, C, D, 
and E, to an average of V, including 
a 5 point curve if the initial average 
is below 70 points; 

3. Print out the reciprocals of the num- 
bers 1, 2, 3, ... while the reciprocals 
are greater than 0.005. 


Figure 3: Three problems with their solu- 
tions in structured pseudocode (see text). 


print N 
print 1/N 
print -N 


Given a number N, print the 
number, its reciprocal, and -1 
times the number; 


(a) 


Figure 2: The three basic 
control structures includ- 
ing pseudocode and flow- 
chart equivalents: (a) se- 
quence, (b) if...then... 
else, (c) do...while. A 
“block” of code is one or 
more statements and/or 
do. ..while and if.. .then 
.. else structures. 


BLOCK 1 


if (condition) 
then 
block 1 
else 
block 2 
endif 


(b) 


V=(A+B+C+D+E)/5 
ifV< 70 
then V=V+5 
endif 


Average five test scores A, B, 
C, D, and E, to an average of 
V, including a 5 point curve if 
the initial average is below 70 
points; 


(b) 


BLOCK 2 


BLOCK | 
BLOCK 2 
’ 


' 
’ 
BLOCK N 
' 

' 

{ 
' 


do while (condition) 
block 1 


endwhile 


(c) 





N=1 

do while (1/N) > 0.005 
print 1/N 
N=N+1 
endwhile 


Print out the reciprocals of the 
numbers 1, 2, 3, ... while the 
reciprocals are greater than 
0.005. 


(c) 
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if (condition) 110 IF (inverse of condition) GOTO 270 


then 120 


block 1 


block 2 
blocks 1 and 2 
else 


block 3 
250 


260 GOTO 520 
(a) 270 


endif 


block 3 


510 


520 (next statement after IF) 


(0) 





Figure 5: The if... .then.. .else structure: (a) pseudocode, (b) BASIC equiv- 
alent. The first statement (here line 110) is always an IF statement branch- 
ing on the inverse of the condition given in the pseudocode; the branch is 
made to line 270, two lines past the last line performed by the ‘‘then”’ branch 
(line 250). The next statements are the code represented by the “then” 
branch (here lines 120 to 250), followed by a GOTO to the first statement 
after the “else” code (here line 260, branching to line 520). After the GOTO 
is the code for the “else” branch of the if. . then. . .else statement (here lines 
270 to 510). In actual practice of course appropriate line numbers would be 
used in the BASIC program. 


do while (condition) 


block Figure 6: The do. . .while 

block 2 structure: (a) pseudocode, 

(b) BASIC equivalent. The 

first statement (here line 

(a) 110) is an IF statement 

that branches on the in- 

verse of the given con- 

110 IF (inverse of condition) GOTO 390 dition to line 390, two 

120 lines after the end of the 

do. . .while foop. The 

statements comprising the 

body of the loop are next 

ae Tee (here lines 120 to 370), 

followed by a GOTO to 

380 GOTO 110 the first line of the loop 
(here line 380). 


endwhile 


390 (next statement after do. . .while loop) 


(6) 
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statement 1 110 (statement 1) 
statement 2 120 (statement 2) 


statement 3 130 (statement 3) 


statement 10 200 (statement 10) 


(a) (b) 


Figure 4: The ‘“sequence’’ structure. The 
translation from pseudocode (a) to BASIC 
code (b) is simply one of coding several lines 
in ascending sequence, 





(The answers are in figure 3.) 

Once the idea of solving problems in con- 
trol structure forms becomes natural, coding 
the problem in BASIC is no more than a 
straightforward translation (see figures 4, 5 
and 6). Notice as mentioned before, that it 
is the inverse of the condition in the do... 
while and the /f.. .then...else that appears 
in the BASIC code; this is because BASIC 
uses conditions for jumping instead of for 
not jumping. Except for that, coding struc- 
tured BASIC is no more than a matter of 
practice. 

After enough time for structured pro- 
gramming to become second nature to me, 
| found that certain applications of strict 
structured programming resulted in pro- 
grams that were overly bulky or inelegant. 
Take the example of a program that sums up 
user entered numbers until a zero is en- 
countered. The structured pseudocode and 
BASIC equivalent are shown in figures 7a 
and 7b. But notice that flag F1 exists only 
to signal that the do...while loop should 
be terminated immediately, a situation 
fully determined by whether or not the last 
input N is zero. The test of N in statement 
150 is the second thing done in the loop 
that goes from 130 to 190; if control could 
transfer at the end of the loop to 140, 
which drops into the test at 150, we could 
throw away F1 and the do...while loop 
at 130, as in 7b, for a savings of one variable 
and several lines! This happens a tot in 
programs that interact with the user, so | 
thought, what if | devise a structure called 
“read X and do while (condition of X)?” 
It is still one-in, one-out; it’s easy to under- 
stand; and it’s barely different from a plain 
do. . .while. So | used it and began looking 
for other opportunities to add constructs 


Figure 7: — Improving 
“strict” structured pro- 
gramming: (a) is the 
pseudocode for a problem 
to add user inputs until a 
zero is encountered, using 
only sequence, if...then 
...else and do... while. 
(b) is the pseudocode 
of (a) translated into 
BASIC. (c) is an equiv- 
alent BASIC solution that 
saves three lines of code 
and one variable by slight- 
ly bending the form of a 
structured programming 


sum =0 
flag = 1 
do while flag=1 
input num 
if num#O 
then 
sum=sum+num 
else 
flag=0 
endif 
endwhile 


(next statement) 


IF F1#1 GOTO 200 
INPUT N 
IF N=0 GOTO 180 


S=S+N 
GOTO 130 
F1=0 


GOTO 130 


(next statement) 


do. . while /oop. 


to structured programming as originally 
conceived, 


Do. . .until 


The structure closest to the three basic 
control structures is the do...unti! loop, 
illustrated in figure 8. The main difference 
between it and the do... .while loop is that, 
in the do. . .until loop, the expression to be 
evaluated is the last statement in the loop 
instead of the first; this insures that the 
body of the loop is done at least once. 


Figure 8: The do.. .until 
loop: (a) pseudocode, (b) 
flowchart, (c) BASIC 
equivalent. The first 
statements (here lines 110 
to 380) are the code for 
blocks 1 thru n. The next 
and fast statement (here 
line 390) is an IF state- 
ment that branches on the 
inverse of the condition 
listed in the pseudocode to 
the first statement of the 
do. ..until /oop (here fine 
770). 


do until test 
block 1 
block 2 


block n 


endif (condition) 


(a) (b) 





A do. . .until loop is written in BASIC by 
writing the statements in the body of the 
loop, then adding an IF statement that 
branches to the first statement of the loop 
if the condition is not met (the inverse of 
the original conditional expression is used 
here). 

Consider our earlier problem of adding 
a number of user inputs until a zero is en- 
countered. The solution to this, using the 
do...until structure, is given in figure 9. 
Notice in the pseudocode that I’ve put the 
conditional relation on the last line of the 


BLOCK 2 
' 
' 


140 INPUT N 
150 IF N=0 GOTO 200 


160 S=S+N 
170 GOTO 140 


200 (next statement) 


(c) 


blocks 1--n 


! . 
— , 
390 IF (inverse of condition) GOTO 110 


400 (next statement) 
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Figure 9: Illustration of 
do...until /oop: (a) 
pseudocode, (b) BASIC 
equivalent. The problem 


S=0 
do until test of N 


input N 


110 INPUT N 


case on N 


illustrated is to create code 


that will sum all user 
inputs until a zero is 


encountered, then print printS 


the sum. 





loop to remind the user that the test is made 
at the end, not the beginning, of the loop. 
You may prefer to write the conditional 
expression on the same line as the words 
do until (in this case, line 2 of the pseudo- 
code as do until N = 0), but you will have to 
remember that the test is delayed until 
after the last line of the loop. 

Looking at the BASIC code, there is a 
One-to-one correspondence between the 
pseudocode and BASIC code except that 
line 2, the do until line, has no equivalent, 
and line 5, the endfoop line, translates into 


if N=1, block 1 120 
if N=2, block 2 
if N=3, block 3 


block 1 
180 
190 GOTO 320 
200 


endcase 


. block 2 
230 
240 GOTO 320 
250 


310 


320 (next statement) 


(a) (b) 


110 GOTO 120, 200, 250 ON N 


S=S+N 120 S=S+N 
endloop if N=0 130 IF N40 GOTO 110 


140 PRINT ‘SUM IS’;S 


(a) () 


an IF statement that completes the loop 
only if N # 0 (that is, only if the inverse 
of the conditional statement is true). (Look- 
ing back at figure 7c, we see that we have 
improved on our read N and do until N =0 
loop by one statement, mainly because a 
do until will always have one less BASIC 
statement than its corresponding do while.) 


Case 


The case statement is used when the value 
of a variable determines which of N mutu- 
ally exclusive blocks of code is to be exe- 


110 IF NH GOTO 190 
120 


block 1 
180 
190 IF N#2 GOTO 240 
200 


. block 2 
230 
240 IF N#& GOTO 320 
250 


310 
320 (next statement) 


(c) 





Figure 10: The case statement: (a) an example in pseudocode, (b) the example in BASIC using 
a computed GOTO, (c) the example using a series of IF statements. The computed GOTO (b) is 
used when the values that N takes can be “boiled down”’ to the integers 1, 2, 3, ... (for ex- 
ample, if N took the values 10, 15, 20, we would GOTO on (N-5)/5). A series of IF statements 
(c) would be used when the values of N are irregular and (b) cannot be used. 


100 FOR 1=1 TO 200 100 FOR I=1 TO 200 100 FOR I=1 TO 9999 
110 IF A> B GOTO 360 


170 IF A>B GOTO 220 170 IF A> B GOTO 700 


ok 


350 NEXT | 350 NEXT I 350 NEXT I 


360 (next statement) 360 (next statement) 360 (next statement) 


(a) (b) (c) 





Figure 11; Uses of FOR...NEXT loops in structured programming. A FOR. ..NEXT loop is 
okay if no statement within the loop ever transfers control outside the main loop; see (a). The 
situation in (b) is definitely not structured; there is no way to guarantee that line 360 (and sub- 
sequent lines) will be done when the FOR. ..NEXT loop is completed. A do. . .while loop may 
be fashioned as in (c) , which is equivalent to do while ASB. Note that the index of the loop, |, 
is simply “marking time’’ but as such cannot be used for another purpose within the loop. 


beginloop 
block 1 


} svock | exitif (condition 1) 
BLOCK | 
black 2 


exitif (condition 2) 


block 3 


150 
160 IF (condition 1) GOTO 470 


170 
endloop 


(b) 


Figure 12: The beginloop IF (condition 2) GOTO 470 
.. exitif. . .endloop struc- 
ture: (a) flowchart, (6) ; 
pseudocode, (c) BASIC ; block 3 
equivalent. Notice the IF 
statements (here at 160, 
BLOCK 3 320 and 450) all branching IF (condition 3) GOTO 470 
to the first line of the next GOTO 110 
structure (here line 470, 
two lines after the end of (next statement) 
block 3). The second line (c) 
after the end of block 3 
(here line 460) is a GOTO 
that jumps to the first 
statement of block 1. 
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beginblock 


block 1 


block 1 
BLOCK | 


loopif (condition 1) 
block 2 IF (condition 1) GOTO 110 
loopif (condition 2) 
block 3 
block 2 
loopif (condition 3) 
endblock IF (condition 2) GOTO 110 


(b) 


Figure 13: The begin- 
block ... loopif .. . exit- 
block structure: (a) flow- 
chart, (b) pseudocode, (c) 
BASIC equivalent. Notice 
that the IF statements (c) 
(here at 160, 320 and 
460) go to the beginning 
of block I and that they 
branch on the condition 


BLOCK 3 
IF (condition 3) GOTO 110 


(next statement) 
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itself, not the inverse. 


cuted (with control passing to the next 
statement after the chosen block is per- 
formed); from this, you can see that the 
if.. .then,..else structure is a special case 
statement with N = 2. 

A case statement is implemented in 
BASIC by either sequential IF statements or, 
if the variable can be ‘“‘boiled down” to an 
integer ranging from 1 to N, a computed 
GOTO statement. Remember that, since 
control eventually passes to the first state- 
ment after the case statement, no block 
within the case statement may contain a 
GOTO statement except as the last state- 
ment within a block branching to the first 
statement after the case statement; to do 
otherwise would damage the structure’s 
property of one-in, one-out. 

An example of a case pseudocode state- 
ment and two BASIC equivalents is given in 
figure 10. Note that, when using a computed 
GOTO, each block of code must end with 
GOTO nnn, where nnn is the next line 
after the case statement. In figure 10c, IF 
statements are used to branch around the 
blocks of code if the variable N does not 
have the appropriate value for that block. 





Subroutines, User Defined Functions, and 
FOR. . NEXT Loops 


One of the most important features of 
a structured program is that it is composed 
of one-in, one-out blocks that are not 
jumped into or exited from except at the 
beginning or the end of the block. There- 
fore, as far as | am concerned, there is no 
reason from the structured programming 
point of view why | can’t use both sub- 
routines and user defined functions (using 
the DEF statement) in my structured 
programs; they are both one-in, one-out 
constructs of BASIC and save repeating 
identical code. 

Using the FOR...NEXT loop is a dif- 
ferent matter. Unlike the subroutine or the 
user defined function, control can be trans- 
ferred from anywhere inside the loop to 
anywhere outside the Joop; in this case, a 
FOR...NEXT loop by itself is unsuitable 
for a structured program and should be re- 
placed by either a do...unti! or ado... 
while loop (if used properly, a FOR... 
NEXT loop can be used to implement 
either of these; see figure 11). But a 
FOR...NEXT loop’s most valid use is 


simply as a shorthand for a block of code to 
be repeated identically a given number of 


fi $ beginblock 

times. Used this way, the loop keeps the = 

one-in, one-out feature necessary to all input N 110 INPUT N 

structured programming control structures. loopif N10 or N<1 120 IF N>10 or N<1 GOTO 110 


loopif N # INT(N) 130 IF N # INT(N) GOTO 110 


Beginloop. . .exitif. . .endloop 


The beginloop. . .exitif...endloop struc- 
ture is described in several! books detailing 
advanced structured programming tech- 
niques, and while it does not have the gut 
level intuitive appeal the basic three do, it 
keeps popping up in programs | write, so 
it must be fairly useful. 

The flowchart for the beginloop... 
exitif...endloop structure is in figure 12a. 
It is basically a loop with several exit points 
(do.. .while and do. ..until can be seen as 
specific cases of this general form). It has 
one entrance and one exit (several exit 
points, but the transfer of control is always 
to the next statement after the loop struc- 
ture), and an example in pseudocode and 
BASIC is shown in figures 12b and 12c. 
Note that here the conditional expression 
and not its opposite is translated from 
pseudocode into BASIC (see lines 160, 320 
and 450, figure 12c). 


Other Structures 


Even with all the above structures, | 
keep finding situations that can’t be fitted 
into any of them. So when several situa- 
tions came up repeatedly, | modified exist- 
ing structures to fit them and still be of 
the most general use. But, for the structures 
that remain, the emphasis is more on con- 
venience than on utility. 

A very useful variation of the begin- 
loop. . .exitif...endfoop structure is one 
that loops (instead of exits) when certain 
conditions occur. | call this a beginblock... 
loopif. . .endblock structure (see figure 13); 
it is very useful for performing a certain 
Operation until all of a series of conditions 
are met. An example of this is the code in 
figure 14 that requests from the user an 
integer input between 1 and 10; notice that 
we /oopif the input N is not between 1 and 
10, and we also /oopif N is not an integer. 

| have created another pseudocode in- 
struction called read...until valid for the 
specific purpose of reading and validating 
a user input, usually when the validation 
process is very simple. The BASIC code for 
the above problem is the same as in figure 
14b (unless you want to add some error 
message statements), and the pseudocode 
is simply: 


endblock 


(next statement) 140 (next statement) 


(a) 





Figure 14: An example of beginblock. . .loopif. . .endblock: (a) pseudocode, 
(6) BASIC equivalent. The problem illustrated is to get an input from the 
user that is both between I and 10 and an integer. Here the second block 


(between the two loopifs) is empty. 





read N until valid 
invalid if N not between 7 and 10 
invalid if N not integer 


with the two invalid lines not necessarily 
written down. (Notice that for the last two 
structures, the conditional expressions used 
in the pseudocode are not inverted when 
transferred to the BASIC code.) 


The Beginning of the End 


Those are all the structures I’ve come up 
with. They may or may not be justified in 
your mind by the improvements they allow 
over “‘strict’’ structured programming; but 
each of them is (at worst) a shorthand 
that takes the programmer a step further 
from planning a program in instructions and 
a step closer to planning in well-defined 
subtasks. And because these subtasks are 
always a proper subset of any language 
that allows unlimited GOTOs, it is simple 
to write structured programs in BASIC (or 
in any other all-purpose language), using 
modules of code that are functionally in- 
dependent and “one-in, one-out.”” 

However, there are several other aspects 
of problem solving — problem definition, 
program design, debugging and testing, and 
program revision — that can benefit from the 
application of a methodical technique (and 
this becomes less of a luxury and more ofa 
necessity as program size increases). In the 
second part of this article I’ll use the prob- 
lem of writing a game to play NIM to 
illustrate the use of structured programming 
in the entire problem solving process.@ 
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In part 1 I covered the basic constructs 
of structured programming, several addi- 
tional structures (see table 1), and how to 
program them in BASIC. Now | want 
to show my idea, at least, of good program- 
ming habits, as well as the application of 
structured programming techniques to the 
entire range of problem solving. As an 
example, [ will show how I went about 
writing a program that plays the game of 
NIM. 


NIM as a Computer Game 


| picked NIM because it is simply ana- 
lyzed, making it possible to concentrate on 
the writing of the program and not on the 
development of the computer’s playing 
strategy. 


Basic Structured Constructs 


sequence 
if... .then. . .else 
do, . .while 


Added Structured Constructs 


do. . .until 

case 

subroutines 

for loops 

beginloop. . .exitif. . .endloop 
beginblock. . .loopif. . .endblock 
read. . .and do while 

read. . .until valid 


Table 7: The constructs of applied 
structured programming, as explained in 
port 17. The “basic” constructs are 
universally recognized as being sufficient 
to implement any program, the “added” 
constructs are recommended by the author 
as extensions of the basic constructs that 
make structured programming more ver- 
satile and manageable. 


Applied Structured Programming 
...and How to Use It: Part 2 


Gregg Williams 


The rules are as follows: the game starts 
with a pile of, say, 17 sticks. Players alter- 
nate turns, taking one, two or three sticks. 
The person taking the last stick loses. 

It doesn’t take much analysis to show 
that a player is in a “safe” position if there 
are 1, 5, 9, 13,. . .pieces after his or her 
move. No matter how an opponent moves, 
the player can take enough sticks (four 
minus opponent’s move) to put the game 
back to a “‘safe’’ position. The player's 
opponent is hamstrung and will definitely 
lose. 

The computer’s strategy is given in table 
2 and is based on what the pile of sticks 
looks like in terms of multiples of 4. The 
computer wants to leave the pile in the 
form (4n+1), a safe position for it. But 
the computer is in a bad situation if the 
pile looks like (4n+1) at the beginning of 
its turn (it also means the human player 
is in a “safe” position and will win if the 
correct moves are made). In this case, the 
moves of 1, 2 and 3 all leave the computer 
in an “unsafe’’ position; so, for this program, 
1 decided to let the computer take 1 so as to 
prolong the game. 


A Problem Solving Approach 


Before I get into the hand waving that 
will enable you to see how | wrote this pro- 
gram, I|’d like to give you an overview of 
how | think a program should be attacked 
a la structured programming. 

Step 1: Define the program in terms of 
what it will and will not do. Don’t laugh— 
who hasn’t been coding a program only to 
remember, “Omigosh, | forgot to put in 
something to... .” Keeping last minute 
additions or afterthoughts to a minimum 
reduces the possibility of unexpected 
interaction between statements, often called 
bugs, glitches, blowups and so on. 

Step 2: Flowchart “the big picture.” A 
lot of this is intuitive, but it means break the 
program into the first subprograms that 
come to mind, and show where these come 


in program flow. Unless a program is very 
simple, it is hard to go straight past this 
step into step 3; | have to literally see the 
program flow in flowchart form before I 
can begin thinking in terms of if. . .then 
.. .else and do. . .while and other control 
structures. 

Step 3: Translate this into an overview 
with structured pseudocode. By now you 
have to have a loose idea of what the sub- 
programs (let’s call them modules) will do. 
You don’t have to have module definition 
pinned down to the finest point, for simply 
having thought in terms of modules means 
you’ve already put more thought into this 
stage than most people do. Also, gluing 
the modules together with structured 
pseudocode gets you started toward a 
structured program. It won’t be structured 
unless it starts structured. 

Step 4: Program each module in pseudo- 
code. Aha, here’s where you find out what 
you've left out. Notice | said ‘‘program.” 
| mean it: you should write out exactly 
how a module will be executed as if it 
were the program that goes into the com- 
puter. The pseudocode should be so detailed 
that translating it into BASIC (or any other 
language) is almost a mechanical chore. This 
step may cause you to go back to step 3, 
but that’s okay, for it is probably faster to 
revise on paper than in the computer. 

A note on modules: a key factor in the 
success of a structured program is the 
functional independence of modules. This 
means that a module should do a certain 
thing regardless of what the modules before 
it do, thus minimizing the possibility of 
unexpected module _ interaction. For 
example, if module A is designed to perform 
some computation on variables X and Y 
giving result Z, the only way module B, 
which calls module A, should be able 
to influence results is by changing the 
inputs X and Y prior to the call. The 
internal machinations of B should not 
affect A except through the identified 
input and output parameters of block A. 

Step 5: Translate each module into 
BASIC code. Using the forms | outlined 
in part 1, going from pseudocode modules 
to BASIC modules is a mechanical trans- 
lation process; the only thing you really 
need to think about is assigning and keeping 
track of variables and functions. | use a 
chart to do that; see figure 8 for an example. 

Step 6: Test each module. You're on 
your own here, but you must do something 
to check out what a module is supposed to 
be doing in terms of function, input and 
output. This is a “bottom up” approach 
to programming (note, however, that the 
design is “top down”). Although “top 


Number of Sticks Number of Sticks 
in Pile at Beginning Computer Takes 
of Computer’s Turn 

4n 3 

4n+1 1 

4n+2 1 

4n+3 2 


down” programming has been praised for 
its ability to catch unexpected module 
interaction, | ask: how can it until those 
modules (mistakes and all) have themselves 
been written? (An incisive analysis of the 
design process is given by Knuth in The Art 
of Computer Programming, Fundamental 
Algorithms, volume 1, pages 187 to 189 in 
the second edition.) 

Step 7: Test the program. Glue the 
modules together with the BASIC equiva- 
lent of the pseudocode from step 3. Start 
it running and hunt down bugs. Even if it 
works, keep hunting until you are tired 
of running the program. The brevity of 
this step (and the assurance that the 
program will not one day unexpectedly 
blow up) is your reward for the work done 
in the first six steps. 

Step 8: If you add to the program, add 
structured code. | know it’s hard to do. It’s 
even hard for me to do, and I’m the one 
who’s writing this article. But, unless the 
addition is extremely trivial, make sure 
that the code you add fits in, in a structured 
sense. Don’t jeopardize functional indepen- 
dence. Do break down a module if 
necessary to rewrite it. 


NIM: Initial Design (Steps 1 thru 3) 


Now we're ready to work on the NIM 
playing program. After thinking about the 
possibilities, 1 decided on this rough working 
definition: This program will play a series of 
NIM games against a human opponent. It 
will use the residue of four algorithm for its 
Strategy and will give the user the option of 
choosing who goes first and how many 
sticks are in the pile; the default will be 17 
sticks and human goes first, an automatic 
win for the computer. The program will also 
check human inputs for validity. 

My initial flowchart is in figure 1. Notice 
that there are four basic modules: initiali- 
zation, player-turn, computer-turn and 
evaluation. \f you want to go into more 
complex detail (and you will have to in a 
larger program), you can say that jinitiali- 
zation basically sets the number of sticks 
in the pile and who goes first. Player-turn 
accepts a move, checks its validity, and 
subtracts the move from the pile. Computer- 
turn analyzes the pile, chooses a move, and 
subtracts the move from the pile. Eva/uation 


Resulting 
Position for 
Computer 


safe 
unsafe 
safe 
safe 


Table 2: Computer's strat- 
egy for the NIM game. 
Either player is guaranteed 
a_ win (assuming that 
player makes no mistakes) 
if the pile of sticks is a 
number of the form 4n+1 
at the end of his/her turn. 
Notice that when the play 
begins with 4nt1 sticks, 
the computer is forced to 
take one, two or three 
sticks and so leaves itself 
in an unsafe position at 
the end of its move. 
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Figure 1: A_ high level 
flowchart for the NIM pro- 
gram. Most of the blocks 
represent some large 
chunk of the overall prob- 
lem; such blocks are called 
“modules.”’ Actions com- 
mon to more than one 
block should be brought 
outside ‘the blocks and 
shared. For example, the 
block labeled “evaluation” 
was part of both modules 
“player-turn”’ and “com- 
puter-turn” until it’ was 
seen that it could be 
brought outside and made 
a module of its own. 















INITIALIZATION 


PLAYER-TURN 


EVALUATION 
(OF BOARD) 
SET TO 
OTHER PLAYER 






COMPUTER-TURN 





Figure 2: The structured pseudocode overview. This overview, equivalent to 
the flowchart of figure 1, is written in a pseudocomputer language and is 
the first step past the flowchart toward a completed BASIC program. Notice 
that the modules which will be filled in as details are later noted as names 
enclosed in parentheses, that preliminary variables are used and given descrip- 
tive names, and that the entire problem is outlined by this pseudoprogram. 


0 NIM: 
1 games-played = 0 
2 do until test of endsession 
3 do until test of gamewon 
4 (initialize) 
5 if computer’s-turn = O then 
6 (player-turn) 
7 else 
8 (computer-turn) 
9 endif 
10 (evaluate) 
11 computer’s-turn = 1 — computer’s-turn 
12 endloop if gamewon 
13 print endgame messages and ask if user wants to play again 
14 receive user response 
15 if response = yes [ie: if endsession = 0) 
16 games-played = games-played +1 
17 endif 
18 endloop if endsession 
19 end-of-program 


checks to see if the pile is down to 1 (in 
which case the winner is declared) and also 
makes several comments as endgame ap- 
proaches (this is the only module whose 
function grew as the module was written). 

The first real work is done with the crea- 
tion of the structured pseudocode overview 
in figure 2. The process is fairly simple here 
because the program is well-defined as a 
flowchart; but with a flowchart that has 
constructs that are definitely not recog- 
nizable control structures, you have to twist 
the flowchart (maybe even rewrite it) until 
you can see it in terms of sequence, /f... 
then. . .else and do, . .while. 

Notice that at this step you begin to 
define flags. There will be a flag (which will 
have a value of 1 or O, standing for true or 
false) representing the status of endsession, 
gamewon and computer’s-turn. Notice also 
that variable names are descriptive enough 
for the reader to understand exactly 
what is happening; be sure to keep this first 
pseudocode overview readable. 


NIM: Detailed Design (Step 4) 


The bulk of thinking from here on out is 
in this step, the writing of each module of 
pseudocode. You will probably discover 
changes and additions you need to make; 
the advantage of doing so at this point is 
that it is easier to make corrections and 
revisions in pseudocode than it is to make 
them in BASIC. One reason for this is that, 
since pseudocode is not read by the com- 
puter, you do not have to spend any time 
making sure that it is syntactically correct 
(instead, you spend the same time making 
more changes); another is that pseudocode is 
easier to change because it is easier to 
read. Compare the pseudocode “if who- 
plays-first = computer, then. . .” with the 
BASIC statement “1600 IF P=0 GOTO 
1660.” 

The pseudocode for my basic modules 
is in figures 3 thru 7. Although I made 
really only one draft of each module 
before 1 translated it to BASIC, the 
draft itself contains many _ erasures 
and insertions; for me, working in pencil 
is a must. You may find an operation 
that occurs several times within the 
different modules; if so, you'll want to 
make it either a module or a subroutine. In 
the case of the NIM program, | decided to 
make a module out of the part of code 
needed to print the current board position. 
| could have left it as part of the evaluate 
module, but to do so would have obscured 
the module’s purpose with too much detail. 
As it stands, the only information given in 
the evafuate module is “Print current board 


Figure 3: The initialize module. 


Relative BASIC 
Line No Line No Pseudocode 
(¢) —_ initialize: 
1 220 if games-played = 0 
2 230 ask if user wants instructions 
3 240 read user-answer until valid 
4 260 if answer is ‘yes’ 
5 280 print instructions 
6 310 endif 
7 310 endif 
8 330 ask user if he wants to choose number sticks and who goes first 
9 360 read user-choice until valid 
10 380 if user-choice = default 
11 390 number-of-sticks = 17 
12 400 computer’s-turn = 0 
13 else 
14 420 ask user how many sticks to begin with 
15 440 read number-of-sticks and do while number-of-sticks <13 
16 460 error message ‘sorry. we have to have at least 13 sticks’ 
17 470 endwhile 
18 490 ask user who goes first 
19 500 read computer’s-turn until valid 
20 510 endif 


The pseudocode in figures 3 to 7 represents the second level of breaking 
a problem into subproblems (the first level was from defined problem to 


the structured overview of figure 2). Notice that the trend is to write the 
lines so that they are easily understood rather than to make them look 
like formal computer code. The numbers in front of most of the lines 
represent the beginning line numbers of the equivalent statements in the 
BASIC program. 





Figure 4: The player-turn module. 


Relative BASIC 


Line No Line No Pseudocode 
0 —_— player-turn: 
1 810 do until test of invalid-move 
2 845 valid-move = 1 
3 830 ask user for his move 
4 840 input user-move 
5 850 if user-move not between 1 and 3 
6 855 print error message 
7 860 valid-move = 0 
8 860 endif 
9 870 if user-move > number-of-sticks 
10 875 print error message 
11 880 valid-move = 0 
12 880 endif 
13 900 endloop if valid-move = 1 
14 920 number-of-sticks = number-of-sticks — user-move 
Figure 5: The computer-turn module. . 
Relative BASIC 
Line No Line No Pseudocode 
0 —_— computer-turn: 
1 1120 remainder = number-of-sticks modulo 4 
2 1140 case on remainder 
3 1150 if remainder=0, then computer’s-move=3 
4 1170 if remainder=1, then computer’s-move=1 
5 1170 if remainder=2, then computer’s-move=1 
6 1190 if remainder=3, then computer’s-move=2 
7 1210 number-of-sticks = number-of-sticks — computer’s-move 
8 1230 print computer’s-move to user 
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Relative BASIC 


Line No Line No 
(a) name 
1 1415 
2 1520 
3 1540 
4 1550 
5 
6 1570 
7 1570 
8 1570 
9 . 1890 
10 1600 
11 1620 
12 1630 
13 1640 
14 1640 
15 
16 1660 
17 1670 
18 1690 
19 1700 
20 1720 
21 1730 
22 1740 
23 
24 1760 
25 1760 
26 1760 
27 1760 
28 1760 


Pseudocode 


evaluate: 


(print-board) 
if number-of-sticks = 1 
if computer’s-turn = 1 
Print computer-tloses message 
else 
print user-loses message 
endif 
endif 
if number-of-sticks between 6 and 8 
if computer’s-turn = 1 
if RND < 0.6 
print computer-resigns message 
number-of-sticks = 1 
endif 


se 
if RND <0.3 
print you're-in-trouble message 
ask if user wants to resign 
input user-answer until valid 
if user-answer = ‘yes’ 
print user-resigns message 
number-of-sticks = 1 
else 
print nasty answer 
endif 
endif 
endif 
endif 


Figure 6: The evaluate module. “RND" refers to a random number between 
zero and one; when either player resigns, number-of-sticks is set to 1 to signal 


end-of-game. 
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position,” which tells the reader exactly 
what is being done; if the reader wants more 
detailed knowledge, it is possible to refer to 
the print-board module. 

The value of pseudocode can be seen in 
the fact that very little in the program needs 
to be explained. This is why high level lan- 
guages which are closer to pseudocode make 
better programming languages than BASIC, 
Figures 3 and 4 are complicated only by 
read. . .until valid statements that check user 
responses (the checking of input data is 
usually a good idea unless your computer 
has real space problems). The computer-turn 
module, figure 5, implements the computer 
strategy of table 2. 

In the eva/uate module, figure 6, if both 
players play perfect games and the number 
of sticks is six, seven, or eight, the player 
who has just moved will definitely lose the 
game (the opponent can take one, two or 
three sticks, respectively, finishing with five 
sticks, a “safe” position for the opponent). 
The if statement beginning “If number-sticks 
between 6 and 8” (line 9 of figure 6) and 
ending with the last ‘endif’ of the module 
takes care of this situation. If the computer 
is about to lose, it resigns six-tenths of the 
time; otherwise, it gives the human a chance 
to resign three-tenths of the time. (Notice, 
in figure 1, that the evaluate module comes 


before the variable computer’s-turn is 
changed, so that, in the evaluate module, 
computer’s-turn=0 means that the human 
has just played and that the computer’s turn 
is next.) 


NIM: Translating to BASIC (Step 5) 


Two aspects of the pseudocode-to-BASIC 
translation need attention: the translation 
itself (covered in Part 1), and the assign- 
ment of variable names and meanings. The 
translation, although it requires attention, 
is straightforward enough once you are used 
to it. Assigning BASIC line numbers cor- 
responding to relative line numbers of 
pseudocode (given in figures 3 to 7) should 
make matters easier. 

The assignment of variable names, 
however, is another matter. Each named 
variable or flag must be replaced with a 
letter or letter-plus-digit name. | think it isa 
good idea to keep track of what names have 
been used, what modules they are used in, 
and what they are used for. An example of 
the chart 1 usually make (here, for the NIM 
program) is in figure 8. | also try to decide 
whether or not | need to store a variable’s 
value for use later in the program; if not, | 
can use the same variable later and save a 
few bytes of storage (when compared with 
creating and using a new variable). 

When writing a module of BASIC code, | 
write the entire module, using a circled 
letter in colored pencil to link the space 
after a GOTO to the line number it belongs 
with. Then |! number the entire module 
and replace the circled letters with the cor- 
rect line numbers. 

Comments are very important and, unless 
you are working with severe memory restric- 
tions, there is no excuse for your not using 
them. (Even with memory problems, put 
comments in your final draft and keep a 
copy.) For example, the comment 


2150 REM K IS THE SUM OF I AND J 
2160 K=1+J 


is extremely lame. But if, in its place, you 
write 


2150 REM K 1S SUM OF FIRST AND 
SECOND GROUP SCORES 


then, within the context of the program, the 
comment will probably remind you (after a 
long absence) of several things you'd for- 
gotten. Given the restrictions on variable 
names in BASIC, comments are more neces- 
sary than they would be in languages with 
longer name possibilities. 


| might point out several places where 
you should always use comment statements. 
One is at the beginning of the program for 
a summary of the name of the program, 
your name as its author, purpose, and so on. 
See lines 50 to 70 in the NIM program in 
listing 1 for example. 

Another place to put comments is at the 
beginning and end of modules, if possible, 
with some eye-catching typography. BASIC 
programs do seem to run together after a 
while (see lines 200, 520, 800, 930 and 
others in listing 1). 

A third place to put comments is just 
before a major control structure (ie: one 
spanning more than a few lines of code). 
Gertrude Stein might not have said, “An if 
is an if is an if. . .,” but she should have. 
Things are easier if you know that an IF 
statement is actually the beginning of a 
do. . .until, an if. . .then. . .else, or some- 
thing else. For example, look at the com- 
ments at lines 810 and 890 of the NIM 
program: 


0810 REM DO UNTIL 900; ENDLOOP IF 
VALID (V=1) 

(body of do... .until) 

0900 IF Vv = 0 GOTO 0820 


A glance at line 810 tells us we are beginning 
ado. . .until that ends at 900; it also tells us 
the condition and the reason for looping. 

Heavily commenting a program reaps 
such intangible benefits that it is difficult to 
justify the time, memory and effort that 
commenting requires. You have always 
heard that comment lines greatly help a 
programmer who must examine a program 
weeks or months after it is written. But you 
probably do not realize that the very act of 
writing down the comment, of trying to find 
the most important few words that will help 
to clarify the situation, that this very act not 
only helps you to remember a given fact 
longer, it also causes you to analyze the 
given situation (and thereby understand it 
better), maybe even to find a mistake you 
had not seen. 

Remember, comments may take effort, 
but the whole idea of structured program- 
ming techniques is that effort on the front 
end will save greater efforts later on. 


NIM: Testing (Steps 6 and 7) 


A module is tested by writing code 
around it that provides it with the variables 
that affect the module’s behavior and state- 
ments that somehow display the module’s 
output. Then the module-plus-test-routine 
should be run, varying the inputs across 
their spectrum as much as is practical 
(testing all possible input combinations is 
the only foolproof method but, alas, be- 


Relative BASIC 
Line No Line No Pseudocode 
0 — print-board: . 
1 1415 print ‘THE BOARD |S’; 
2 1420 sticks = number-of-sticks 
3 1430 do while sticks >5 
4 1440 print ‘////1'; 
5 1450 sticks = sticks -5 
6 1460 endwhile 
7 1470 for | = 1 to sticks 
8 1480 print ‘/'; 
9 1490 next | 
10 1500 print‘ ’ 


Figure 7: The print-board module. This module is actually part of the 
evaluate module but is separated from it for purposes of clarity. Sticks és 
a new variable that is decremented to zero as the current board position is 
printed; notice the semicolons in the print statements that make the state- 
ments print on the same line, as in BASIC. 


Sire 3 2 
2/s]ele 
= 2 
Variable § a 8 3 
Name Use 
N1 number of games completed; 
used outside all modules 
c user-choice to choose sticks and 
first player, temporary 
user-choice to resign, temporary 
number-of-sticks 
computer’s-turn; indicates next to 
play: computer=1, player=0 
Vv during player's turn, indicates 
if move valid: 1=yes, O=no 
M1 player-move 
nom =1,20r3 
M9 computer-move 
R remainder of number-of-sticks (S) 
modulo 4, temporary 
$1 equivalent to S, destroyed by the 





Print-board module 


Figure 8: A table to keep track of variables used. This table shows which valid 
BASIC variable names are being used; in which modules they are used; the 
variable’s meaning and whether or not the variable's value needs to be saved. 
Note that C is a temporary variable; since its value in the initialize module 
need not be saved, it is used again in the evaluate module for another pur- 
pose. In a more complex program, you would make a note by the variable 
name if it is an array (numeric or character) as opposed to a simple variable. 





comes infeasible very quickly). The outputs 
should be predicted before the test is run 
and then verified; ‘‘eyeballing” the outputs 
often lets mistakes slip by that you would 
otherwise catch. 

Program testing is usually more frus- 
trating than module testing, mainly because 


39 


Listing 1: The completed NIM game, written in BASIC. This program plays 
multiple games of NIM against a human opponent with endgame messages 
to the user that differ from game to game. The two most important charac- gram behavior and output for a given set of 
teristics of the program are, first, the liberal use of REM statements, and inputs remains much the same as for module 
second, the coding of the program in terms of structured programming con- testing. 

trol structures, which greatly simplifies program design and debugging. (This Because | had only four modules and 


only the most elusive bugs evade module 
testing. But the method of predicting pro- 


program was run on an 1BM 5700.) 


such a simple design, | skipped module tests 
and went on to test the entire program 
by playing a few games. | found the fol- 


0050 REM ***NIM PROGRAM, WRITTEN BY GREGG WILLIAMS =* lowing errors: two typing errors, flag V 
0060 REM ##WRITTEN 15 APR 77, LAST UPDATE 16 APR ?77"# i 
0070 REM ** = =TRY IT--IT CAN HE BEATEN ax was not set (line 845 was added), and a 
0100 N1=0 
0200 REM «-##e8 MODULE INITIALIZE --END AT S20 wee: flag was set wrong at 1540. At this point, 
He HER vote HSE TWO ROET TaN OPTION IF FIRST GAME (N1=0).: the program was functionally working. 
0230 PRINT “LO YOU WANT INSTRUCTIONS? (1=YES, O-NID* 
0240 INPUT NI : eas 
0250 IF Ni#0&N141 GOTO 0240 NIM: Additions (Step 8) 
0260 IF Ni=0 GOTO 0320 
0270 PRINT °° 
0280 PRINT “OKAY. NIM 1S PLAYED WITH 17 OR MORE STICKS, WITH A‘ i i i 
0290 PRINT ‘MOVE CONSISTING OF THE PLAYFR''S TAK ING A. "o, OR 3° At this point, _the NIM program 5 1s 
0300 PRINT “PLAYING PIECES, OR ""STICKS'*. WE ALTERNATE TURNS, ° finished and running. However, playing 
0310 PRINT ‘ANDI THE PLAYER FORCED TO TAKE THE LAST STICK LOSES. . r a 
0320 PRINT *" several games, | noticed little things that 
0330 PRINT ‘1 USUALILY PLAY WITH 1¢ STICKS AND YOU GOING FIRST. ° . . 
0340 PRINT ' TYPE 1 TF THAT'S OK WHITH YOU. 0 OTHERWISE bothered me: lines of output bunching 
0350 PRINT *° . 
0360 INPUT C together when they were not logically 
bau0 IF Cod GOTO DUDO connected, error messages that needed 
Forgas to be included, the computer writing 
o419 GOTO 9520 “MY MOVE IS 1 STICKS,” to name a 
O420 PRINT ‘HOW MANY STICKS [0 YOU WANT 10 START WITH?’ 7 & 
O440 TNPUT S few. So I repaired several things, mostly 
0450 IF $ 13 GOTO 0480 . . . 
0460 PRINT “**SORRY, WE HAVE TO HAVE AT LEAST 13 STICKSEK: evident from lines in the BASIC program 
AGG PHT Ore not ending in zero. 
O49) PRINT "TYPE ZERO (0) 70 GO FIRST; ELSE TYPE 1° One option that does not show is the 
0500 INPUT P 
0510 TF P#OaP#1 GOTO 0500 f\ i 
0520 REM + ENT OF MOTULE INITIALIZE - if. . then statement at lines 220 and 230 
570 REM that skips the asking-of-rules (lines 2 thru 6 
0600 IF P=1 GOTO 1100 . . . 
0610 REM in figure 3) for every game but the first. 
0800 REM #8 MODULE USER'S-TURN--ENT! 930 0 mee ’ : 
0610 REM .-1O UNTIL 900, ENULOOP IF MOVE IS VALID (V=1)-. Fortunately, this could be added fairly 
0820 PRINT °° . . * 
0830 PRINT ‘YOUR TURN--ENTER YOUR HOVE’ easily by adding a new variable, games- 
na aa aa played (or N1), updating it (at tine 1990), 
OBS0 FF M1:1&M1-.3 GOTO 08670 1 n rl 
0855 PRINT ‘YOUR MOVE ISN''T BETWEEN 1 AND 3° and by placing lines 230 thru 320 in an 
0960 v=0 if. . .then structure that gets done only if 
0670 IF M1:S GOTO O0A90 
0875 PRINT ‘THERE AREN’‘T THAT MANY PIECES LEFT" games-played equals zero. 
0880 v=0 . . 
0890 REM «:NEXT STMT IS TEST FOR END OF DO UNTIL LOOP: > Sometimes it takes more effort to add 
0900 IF v=0 GOTO 9820 , peti 
0910 REM ©<TAKE AWAY FROM CURRENT NUMBER OF STICKS? - code so that the resulting program is still 
0920 S=S-M1 H 
DO SDE RENG LE RITIOR (ROTULES Sk REgeTURIGe structured. But programs resemble Seale 
1000 GOTO 1400 i i i 
1100 REM -<*#8 MODULE COMPUTER'S-TURN--ENEt 2240 xt? > in that they tend to grow quite a bit a ter 
mae Arye Oma es the first time they are “finished.” So, in 
1130 REM ..CASE STMT.--R HAS VALUE 0.1.2.3, $0 GO ON Rt? the interest of maintaining a structured 
GOTO 1150.1170,1170,1190 ON (R41) . . . 
iin ni on program (which is easier to work on), | 
3 2 : 
make it a rule to add structured code to my 
1170 M9=1 
ee noe ey programs. In my experience programming 
1200 REM . DECREMENT STICKS AND TNFORM USFR OF YOUR MOVE at work, it’s been worth it. 
1210 S=S M9 
7220 PRINT ** 
1230 PRINT ‘My MOVF I5°,M7, ‘STICK’, > 
1233 IF M9=1 GOTO 1238 . Final Thoughts 
123% PRINT ‘S' 
1236 GOTO 1210 
1238 PRINT * ° . D 
1240 REM RAN OF MONULE COMPUTE R-MOVE - If nothing else, | hope that I’ve con- 
7250 REM Hi H 1 1 1 
1400 REM 244% MODULE EVALUATE- -END 1770 «4% vinced you ve co sey ie Pere 1S 
1410 REM -?RINT CURRENT BOART. . SI 
Feete ee nce later paid back, and with interest, because 
1420 S1-S that’s what structured programming is all 
1430 TF G1:%5 GOTO 1470 * 
TAMORRINT: 2020S about. By planning your program before 
450 61=51 5 . . es . . 
1460 GOTO 1430 you write it, you eliminate time wasted in 
: ‘i 2 : : 
ab PRINT vas finding out what you’ve forgotten; by 
isgo PRINT * planning your program to fit certain control 
140 Aa dam ante 0 THE FOLLOWING; NEXT STMT STARTS 1560: structures (thereby causing program flow to 
iea0 REN nestéo IF--IF COMP''S TURN, RESIGN; ELSE USER LOSES take a recognizable form), you Save one by 
s ee ( aa cad 2 S ‘i ¥ =Gio é on 
1540 IF P=1 GOTO 1570 a not having to untangle the spaghetti-like 
zs : ‘ “8 ; 4 : 
Aro ieee ee ee ee structures that you might otherwise come up 
1570 PRINT ‘SORRY, CHUM! THAT''S THE LAST STRAW--YOU LOSE.° with 
15980 REM «<NEXT IF DEALS WITH COMP/USER RESIGNATION IF F=6.,7,8>> . 
1590 IF $+815<6 GOTO 1770 


60 


Listing 1, continued: 


14600 TF P-9 GOTO 1660 

1610 REM TF COMP 1S TO Phar. Hk MA OR MAY NOT RESIGN 

1620 TF RRND. 6 GOTO Deru 

1430 PRINT “AGGGHI! TOU VE GOT fh. TF PAN SEE. 1 REGIGN. 

1640 S+1 

14650 GOTO 1770 

1660 1F RNDI- 43 GOTO 1770 

166% PRINT ** 

14/70 PRINT ‘NO MATTIFR WHAT WO WG, T° VR GOT YOU. LO YOU WANT’ 
14680 PRINT ‘TH RESIGN GRACEFULLY, OR LO We FIGHT 11 OUTe:. 

1490 PRINT ‘ «1 FU RESIGN, O 10 PLAY)* 

1700 INPUT 
1710 IF C#0&C4t GOTO 1700 

1720 1F €-0 GOTO 1760 

1730 PRINT ‘OK, 1 ALCEPT (OUR RISIGNATLON. GOOn GAME.” 

1740 S=t 

1750 GOTO 1770 

1760 PRINT ‘OK, CLOWN, 17°°S YOUR FUNERAL’ 

1770 REM -sEND OF MODULE EVALUATE - 

1900 REM CHANGE P TO REFLECT NEW PLAYER 

1910 P-1-P 

1920 REM + DON'T LOOP IF END -OF-GAME (GIVEN HY 5-1) 

19350 IF S:1 GOTO 04600 

1940 PRINT °° 

1950 PRINT "DO YOU WANT TX) PLAY ANOTHER GAME? CL-YES, 0--NOQ)* 
1960 INPUT C 

1970 Th C#1&C#O0 GOTO 1940 

1980 TF C-0 GOIN 7010 

1990 NI-N1+1 

7000 GOTO 0700 


2010 PRINT “OK. CALL Mt UP WHEN YOU'VE GOT MURF TIME. ~ 
7020 IND 
7000 RE but os rreRuGrRAt 





Structured programming in its broadest 
sense is several things. On the highest level, 
it is completely knowing the problem. On 
a middle level, it is the recursive process of 
repeatedly breaking a problem into subpro- 
blems until each subproblem, at whatever 
level, presents a self-evident solution. (Also, 
this level requires some awareness of the 
basic control structures.) On the lowest 
level, structured programming is writing 
each subproblem (using one of several 
given control structures) so that program 
flow is standardized to one of several recog- 
nizable and easily traced patterns. 

It’s strange that computer programmers 
took so long to analyze their own pro- 
gramming methods, especially since analysis 
is so necessary to the problem solving 
process. But the analysis was finally done, 
giving birth to the idea of structured pro- 
gramming. 

Structured programming is not univer- 
sally acclaimed. But the fight between pure 
structured and pure unstructured pro- 
gramming is largely an academic one. [n the 
field, applied structured programming (or 
many of its techniques, under different 
names) is essential to programming complex, 
real world problems. And that means that, 
even in programs of computer experi- 
menters, it couldn’t hurt.@ 


Decision Tables: 











IF (condition 
statement) 


CONDITION CONDITION 
STUB ENTRY 
ACTION ACTION 
STUB ENTRY 


Figure 1: Basic elements of a decision table. A decision table is a formal 
listing of a series of interconnecting facts and possible alternative actions 
associated with a particular situation or process. 


(action 
THEN statement) 









Rule Rule 
1 2 


TABLE HEADER 


Figure 2: Additional decision table elements. The table header (or name) 
allows each table to be uniquely referenced. 


EXTENDED ENTRY EXAMPLE 
Compare Amount to Discount 
Amount 

Compare Quantity to Quantity 
on Hand 


Billing Rate 
Quantity to Ship 





Ordered # Discount Amount 
Buyer Type 


Give Discount Billing 
Back Order Ordered Less 
on Hand Amount 


Figure 4: Example of a mixed entry decision table. 
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How to Plan Your Programs 


Thomas G Bohon 


“Oh, no,” you say to yourself, “Another 
one of those fancy techniques which no 
one can understand, I can’t use, and | can 
definitely get along without!” 

Did something like the above pass 
through your mind when you read the title 
of this article? Well, put aside your doubts 
for a second and read a bit further. | think 
you'll be pleasantly surprised to learn that 
you already know the process I’m going to 
describe and, in fact, probably use it in a 
very informal way every day. All | want to 
do here is to formalize what you already 
know and show you how you can apply 
this knowledge to make the job of program- 
ming your home computer a little easier. 

What am I talking about? Decision 
tables, of course. And, after reading this 
article, you should have a better understand- 
ing of what they are, how they are con- 
structed, and how to use them effectively. 


Some Definitions Before We Begin 


A decision table is simply a formalized 
presentation of the mental process each of 
us goes through every time we are con- 
fronted with a series of facts which require 
us to decide on one course of action or 
another. Stated another way, a decision 
table is merely the writing down of the 
facts and possible alternative actions asso- 
ciated with a particular situation or process. 

In programming, decision tables act as 
effective substitutes for, or as an aid to, 
the block diagrams associated with prelim- 
inary flowcharting. They are used primarily 
when the situation being studied involves 
complex decision logic, since the decision 
table presents not only the original con- 
dition but also the course of action in 
an easy to understand and easy to use 
tabular form. 

There are two main sections of a deci- 
sion table (see figure 1). The upper section 
(shown as exactly half of the table, a situa- 
tion not necessarily found in an actual 
situation) presents the possible conditions 
upon which the decision will be based. 
The lower portion (again, not necessarily 
half of the table) presents all possible 


actions resulting from the possible decisions 
in the upper portion. 

Each portion of the table is further 
broken up into two sections, with the left 
hand section being called the stub and the 
right hand section called the entry. Thus, 
in our typical decision table we have a 
condition stub and a condition entry in the 
upper portion, and an action stub and an 
action entry in the lower portion. 

Figure 2 shows the remaining elements 
of a decision table. Note that there is a 
table header (sometimes called the /abe/ 
or name) which allows each table to be 
uniquely referenced. This is necessary 
in complex situations where the condi- 
tions and actions may require multiple 
tables. 


Each rule in the entries is identified 
by a rule number. The condition stub 


describes a condition in a way that may 
be answered either yes or no (in one kind 
of table) or with a specific value. The 
condition entry provides the means of 
completing the condition statement. The 
action stub describes the action(s) to be 
taken, while the action entry provides 
the means of showing completion of the 
actions. 


Decision tables are generally classified 
by the type of information recorded in the 
entries. There are three types generally 
accepted: 

@ Limited Entry: This is the most 
widely used and, because of its sim- 
ilarity to binary logic, is most suited 
for computer oriented applications. 
Condition entries are limited to a 
Y, N or — (meaning not applicable). 
Action entries are limited to Xs. In 
order to accomplish this, the con- 


dition stub must be written so that a 
true-false condition exists, and the 


action stub must describe the com- 
plete action to be taken. The example 
in this article will be of this type. 

@ Extended Entry: In this type of deci- 
sion table, the entry portion is merely 
an extension of the stub portion. The 
stub describes the variable and the 
entry describes the possible values 
which the variable can assume. This 
type of table is quite well-suited for 
those situations in which only a few 
variables occur, except that those 
few variables may assume many differ- 
ent values, Figure 3 is an example of 
this type of table. 


Note: For those readers 
who would like to learn 
more about decision tables, 
| recommend the following 
books: 


Automatic Data Process- 
ing: Principles and Pro- 
cedures by E Awad and 
DPMA 

Decision Tables and Their 
Practical Application in 
Data Processing by 
Thomas Gildersleeve. 


Both of these books are 
published by Prentice-Hall, 
1970. | would also be 
happy to answer any ques- 
tions raised by my article. 








Closed Table Example #1 


action 


Closed Table Example #2 











RETURN 


action 


EXIT 





Figure 5: Examples of 
open and closed decision 
tables. An open table has 
as its last action in each 
rule a branch to the next 
table in the series. Closed 
tables return control to 
the tables that call them 
upon completion of their 
routines. 


Open Table Example #1 


action 
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@ Mixed Entry: As the name implies, 
this type of decision table has rows 
which contain either limited or ex- 
tended entries. See figure 4 for an 
example. 

As mentioned above, complex situations 
may require more than one table, or you 
may place different types of decisions in 
different tables. Obviously, there must be 
a way for one table to reference another 
and indeed there is. How? Simply by the 
type of table you build. An open table 
has as its last action in each rule a branch 
to the next table in the series. This transfer 
is a permanent one and is accomplished by 
an action stub of GO TO n. A closed table, 
on the other hand, uses an action stub of 
DO n or PERFORM n with the idea that, 
after the called table is completed, control 
will return to the calling table and the 
indicated actions from that point on will 
continue. Return from the called table is 
through an EXIT or RETURN action entry. 
Figure 5 gives examples of both open and 
closed decision tables. 


How to Construct a Decision Table 


The first step in constructing an effective 
decision table is to state the problem in a 
clear and concise manner. For example, 
suppose we wish to construct a table for the 
following hypothetical situation: 


Your firm, which manufactures fridgets 
for home computers, often sells on credit. 
If @ customer places an order which 
exceeds his/her previously established 
limit, the order should be forwarded to 
the credit manager for approval prior to 
filling and shipping it. However, if the 
customer has purchased more than $600 
in the past six months, he/she is consid- 
ered a regular customer, and in such 
cases tentative approval is assumed and 
the order is filled but not shipped until 
credit appoval is received. There is also a 
minimum order value of $100 from all 
customers and all orders less than this 
amount must be returned unfilled unless 
the order is from a regular customer in 
which case it may be filled and shipped. 
All orders over $500 in value receive a 
10% discount and all orders over $750 
receive an additional 5% discount. How- 
ever, the discounts apply only for regular 
customers as defined above. 


By stating the situation as we have, we 
have completed the first step in our decision 
table construction (I realize that | said the 
statement should be clear and concise, but 
we have to have something to work with!). 

The second step in our construction 
process is to isolate and list both the condi- 


tions which will affect our eventual decision 
and the possible actions we may take: 


Conditions Actions 

Regular customer Request credit approval 
Order exceeds credit limit Fill the order 

Order less than minimum Ship the order 


Order less than $500 Reject the order 
Order is over $500, Give 10% discount 

less than $750 Give 15% discount 
Order is over $750 No discount 


At this point we should stop and examine 
our lists for correctness and add any items 
which have been omitted. In our example, 
the last three items in each list are redundant: 
obviously, a single order cannot possibly 
require all three checks nor is it necessary to 
keep all three actions. It would be much 
simpler to check each order for ‘‘over $500”’ 
and “over $750,” assuming that the only 
possible other condition will be ‘under 
$500.” Similarly, instead of listing all three 
discount possibilities, why not fist ‘‘give 
10%” and “give an additional 5%” — this 
covers all possibilities. After our examina- 
tion and the elimination of these redundant 
conditions, we have the following revised 
lists [Note: There is no implied relationship 
between Conditions and Actions at this 
point]: 


Conditions Actions 


Regular customer Request credit approval 
Order exceeds credit limit Reject the order 
Order less than minimum Give 10% discount 


Order over $500 Give additional 5% 
Order over $750 discount 
Fill the order 


Ship the order 


The next step is to place these conditions 
and actions into a formal table structure. 
A general rule to follow when constructing 
the actual table is to list the actions in the 
order in which they are to be performed. 
Further, a condition entry is left blank 
(not applicable) only if the condition is 
either not possible or is overshadowed by 
other conditions also present. Our table, 
in skeleton form, appears as in figure 6. 

After we have filled out the condition 
and action stubs of our table, we must 
complete the entry portion by filling in the 
rules. This is accomplished by returning to 
the original problem statement and carefully 
marking the condition entries and the asso- 
ciated action entries. This is shown in 
figure 7a. 

The final step in building our decision 
table is to insure completeness and eliminate 
both redundancy and contradiction. Con- 
tradiction is best eliminated by careful 
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Sample Table 





Regular customer 

Order exceeds credit limit 

Credit approval received 

Order less than minimum amount 
Order > $500, less than $750 
Request credit approval r-~]—-]}—-1f—-l-|Txl-]|]—| 
Give 10% discount 

Give additional 5% discount 
Fill the order 


[Shiptheorder | = |X| XX | x x] ~~] -[ Xx] - | 
Reject the order | — | |-| -| x] -]| - | 
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Figure 7a: A skeleton decision table developed from the preliminary table in 
figure 6. 
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Figure 7b: The final corrected decision table for the example in text. 
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examination of the problem statement to 
insure that the conditions and actions we 
entered into the table earlier do not con- 
tradict each other. Insuring completeness 
is fairly simple if we understand the “else 
rule.”” Put simply, this rule says that, if none 
of the other rules listed hold, we also have a 
specific action to take. In the case of our 
table in figure 7a, the ‘‘else rule” says we are 
to investigate the error condition. 


Redundancy 


Eliminating redundancy is a bit more 
complicated. There are various rules and 
methods for doing this, and we will discuss 
only one of them. Certainly this is not the 
only “right” method. Also, keep in mind that 
throughout the following discussion we are 
dealing only with two rules which have the 
same indicated actions. 

The first law for eliminating redundance 
says: 


If, with the exception of one condi- 
tion, two rules have the same condition 
entries and, for that one condition, one 
rule has a Y entry and the other an N 
entry, then the two rules can be com- 
bined into one rule with the entry for 
that condition becoming indifferent (not 
applicable). 


Let’s apply this law to our table in fig- 
ure 7a. Note that rules 2 and 6 seem to fit 
the criteria: they both have the same action 
entries and the same condition entries, 
making them candidates for elimination. We 
can thus combine these two rules with the 
result shown in figure 7b. Note that rules 3 
and 9 almost fit our criteria for possible 
elimination: the only difference is that 
there are two conditions with different 
entries, and we are allowed only one by our 
rule. No other rule pairs fit the criteria and, 
after combining the two rules as in figure 7b, 
we may safely assume that our table passes 
this first law of redundancy elimination 
processing. 

The next test to apply can be stated as 
follows: 


Each pair of rules remaining after 
application of the test above must have 
at least one condition for which one rule 
has a Y entry and the other an N entry. 


Those pairs of rules which meet this test 
are said to be independent of each other, 
while those which fail this test are said to 
be dependent on each other. Dependency 
at this point in our tests indicates that the 
table still contains either redundancy (it 
has a dependent rule pair with the same 
actions) or contradiction (there is a depen- 


dent rule pair with different actions). Let’s 
examine our table. 

Pairing each rule with each of the others, 
one at a time (eg: pair 1 and 2, then 1 and 3, 
2 and 3, 3 and 4, and so on), we check the 
conditions for a Y in one rule and an N in 
the other. This isn’t as time-consuming as it 
appears, since we can assume the pair is 
independent upon encountering the first 
occurrence of the Y-N condition. We can 
see, after examining all rule pairs, that none 
of them are dependent. We can therefore 
assume that our table is indeed nonredun- 
dant and that it does not contain any 
contradictions. 

Note: /f we had found a dependent rule 
pair, we would have had to apply the 
following rules to eliminate the redundancy: 


1. If one rule is pure and the other 
mixed, then the pure rule is contained 
in the mixed rule and the pure rule 
may be eliminated. (A pure rule is 
one in which all entries are either Y or 
N, while a mixed rule has both Y and 
N entries.) 

2. If both rules are mixed, there is at 
least one pure rule which is common 
to both which you can eliminate from 
one of the original rules. 


We won’t go into these here, since they 
usually appear only in more complicated 
applications. | mention them simply to make 
our discussion complete. 

Once our decision table is built and we 
have completed the error checking pro- 
cedures mentioned above, we can use the 
table as a basis for either a preliminary flow- 
chart or, with the addition of the necessary 
IO routines, go directly to the coding phase 
of our programming. The path we take at 
this point depends entirely on how carefully 
we have constructed our table. 


Conclusion 


We have seen how we can go from a gen- 
eralized problem statement to a list of pos- 
sible conditions and actions to a completely 
checked out and (we hope) error-free de- 
cision table. Of course, like any other new 
procedure, you will have to use it several 
times before you become comfortable with 
the process. But no matter how difficult or 
complicated it seems, | urge you to try it 
not once but several times in actual program- 
ming situations. After doing so, I’m sure 
you'll agree that using decision tables greatly 
increases your productivity and eliminates 
the situation in which, almost at the end of 
a long program, you discover one little 
condition you forgot back at the beginning, 
which is where you end up again in short 
order! ® 


Programming Entomology 


An entomologist is a bug expert. When he 
sees an insect, it isn’t just a bug to him (in 
fact, he will vociferously protest that not all 
insects are bugs); it has a particular habitat, 
lifespan, favorite food, and breeding pattern. 
Nor is his knowledge just academic; he can 
tell you how to protect yourself from a 
harmful one by killing it or keeping it away. 

The same sort of knowledge is necessary 
for programming. The skilled programmer 
knows what kinds of bugs may attack a 
program, how to track them down, and how 
to keep them from getting there in the first 
place. He knows the ways to get at particular 
bugs, as well as the general treatments which 
are effective against all of them. 

The first thing to realize about bugs 
is that they don’t appear by spontaneous 
generation. They have a creator, and their 
creator is the programmer. (Throughout 
this article, | am speaking only of user 
program bugs; hardware bugs are an entirely 
different breed, subject to different laws, 
and systems software may be beyond your 
control.) No matter how outrageously the 
program is acting, it’s only following orders. 
So what you have to ask about a bug in 
your program is: how did you put it there? 
What kinds of mistakes are you prone to 
make? If you caught a certain bug in one 
part of the program, might you have put 
the same kind of bug elsewhere as well? 
“Thou art God” . . .and thou must take 
care of thy creation. 

But the fact that each programmer 
creates his own bugs doesn’t mean there 
aren't species of bugs found in everyone’s 
programs. Knowing about these species can 
be a great timesaver, especially when the 
species can be identified by the effects. 

One of the most common bugs is the 
Clobbered Value, found where the pro- 
grammer assumes the content of a register 
or the value of a variable is the same as 
before, but it isn’t. Take this attempt to 
exchange the values of two variables: 


10 LETX=Y 
20 LET Y=X 


This fails because when statement 20 is 
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executed, the value of X has already been 
clobbered by the previous statement, with 
the result that Y never gets changed at all. 

Clobbered Values are frequently found 
on subroutine exits. It’s easy to write a 
harmless looking CALL or GOSUB (possibly 
to a routine you haven’t written yet) and 
assume everything will remain the same. But 
strange things can happen if the subroutine 
unexpectedly changes some values. 

A not too distant relative of the Clob- 
bered Value is the Zapped Stack, found only 
in machine and assembly code. It appears 
most often by pushing items onto the pro- 
gram’s stack at the start of a subroutine, 
then failing to pop them, or popping too 
many things at the end. Another way to 
invite this bug is to use the stack pointer 
for some other purpose during the course 
of a subroutine. 

Subroutines are also the habitat of the 
Botched Call. A certain protocol is needed 
to call any particular subroutine. If, when 
you write a call to a subroutine, you expect 
a value to be returned in the wrong place, 
or you assume the subroutine will do some- 
thing which it actually won’t (or vice versa), 
this bug will have gained a foothold. The 
difference between a Clobbered Value and 
a Botched Call is that when you have the 
latter, the subroutine is doing the right 
thing; the calling program is just mistaken 
in its expectations. 

Another species of bug lurks in jumps, 
branches, and GOTOs. The Branch Bug 
is so difficult to fight that serious attempts 
have been made to wipe out its habitat; 
languages and programming styles (struc- 
tured programming) have been developed 
that use no jumps. The Branch Bug comes 
in two varieties: jumping to the wrong 
place, and jumping to the right place with 
inadequate preparation. The first of these 
is easy to produce in languages where 
Statement labels have to be numbers (eg: 
BASIC and FORTRAN, especially BASIC, 
where every statement has to be numbered 
whether it’s ever going to be a jump destina- 
tion or not). The jump with inadequate 
preparation is similar to the Botched Call, 





Clobbered Value Bug: 
Your program changes the 
value of a variable at a 
time and place which is 
unintended. The detection 
difficulty ranges from the 
obvious (after if is found) 
to the subtle (before it is 
found). 
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Botched Call Bug: The 
Botched Call Bug is like 
the proverbial square peg 
in a round hole: Unless 
the peg or the edge of the 
hole yields, sparks will fly. 


Zapped Stack Bug: 
Stack oriented machines 
and software are both 
very egalitarian with 
respect to pushes 

and pops. They like 

to have the same 
number of items pushed 
as are later popped, or 
else they'll transform 
themselves from tranquil 
and placid programs into 
memory zapping monsters. 


but it can often be harder to figure out if the 
program has a complex flow pattern. 

A few special methods are applicable 
to fighting the Branch Bug. One of these 
is program flow analysis. A look at the 
possible paths a program can take will 
often reveal some of these bugs. Is there 
a part of the program that can never be 
reached? Are there traps in the program, 
loops that can never terminate? Are there 
jumps which will result in variables being 
used without having been set to a value? 

In languages like BASIC, where every 
statement is labeled, it’s helpful to set off 
statements that can be reached by jumps 
either by using special statement numbers 
or by pointing them out in comment state- 
ments. In any language, the statements 
that can be reached by jumps should be 
logical breaking points in some sense, 
places where a new unit of work begins. 
Except in desperate situations where 
economy is all-important, jumps should 
be used to satisfy the logic of the program, 
not to save a few instructions. 

If a subroutine call can be used instead 
of a jump, it probably should be used. A 
subroutine will send you back where you 
came from, so figuring out the flow of the 
program is easier. For many purposes, 
you can treat a subroutine as a unit when 
studying the program; as a single instruction 
that happens to do complicated things. 
You can’t do this with the instructions 
reached by a jump. 

The next bug in our survey feeds on 
apples and oranges. More generally speaking, 
the Mismatched Unit is found where the 
units or dimensions of the quantities being 
used in a program aren’t the ones actually 
needed. Take the program statement LET 
V = D * T, where D is a distance in miles, 
T is the time traveled in hours, and V is 
intended to be the traveler’s average velocity 
in miles per hour. By using simple algebra 
on the units, you can see that the result 
obtained will be units of miles times hours, 
not miles per (ie: divided by) hour. 





Bugs of this type are harder to spot when 
the mismatched variables are further apart 
in the program, but consistency will keep 
them from occurring. Simply be sure you 
know in advance what units each variable 
has to come in. 

Assembly and machine language program- 
ming allow an especially messy type of 
Mismatched Unit to show up: mismatches 
between addresses and data, or between 
absolute addresses and relative addresses 
(values to be added to a base address). To 
avoid this bug, watch out for the different 
addressing modes of different instructions. 

Another bug with a specialized habitat 
is the Fencepost Bug, named for its ten- 
dency to rest in problems like this one: 
“If you are putting up a wire fence 100 
feet long, supported by posts every 10 feet, 
how many posts do you need?” Another 
name for this bug is the Boundary Condition 
Bug; it’s always found in connection with 
the start or end of some sequence, where 
special treatment is needed. One form 
manifests itself in confusion over whether 
the first element of a group is number 0 or 
number 1. Another is found in the attempt 
to relate each element of an array to the 
next, as in this statement: 


IF T(1) <T(I+1) GO TO 100 


Try this one setting | equal to the dimension 
of T. 

Finally, we come to the most insidious 
of all bugs, the Timing Bug. The character- 
istic that makes this bug so fearsome is that 
a program infested by one may run correctly 
once but not the next time; it may even run 
correctly 99 times but fail on the hundredth, 
using exactly the same data each time. To 
make matters worse, running programs in 
single step mode will usually drive Timing 
Bugs into undetectable hiding. 

As the name suggests, the Timing Bug is 
one that shows up depending on the order 
in which asynchronous events (events that 
have an unpredictable relationship in time) 





Mismatched Unit Bug: A 
result of inadequate 
analysis of a calculation, 
the Mismatched Unit Bug 
results in strange elixirs. 
When both apples and 
oranges are thrown into 
the analytical engine, 
what is the nature of the 
juice which flows out? 


occur. Systems that have interrupt facilities 
are especially prone to being attacked by 
Timing Bugs, since an interrupt routine may 
be executed at a different point in the pro- 
gram each time it’s run. An interrupt routine 
may, for instance, set up certain variables 
to be used by the main program. If another 
interrupt of the same kind can occur before 
the variables have been processed by the 
main program, and if that interrupt changes 
those variables, unpredictable results can 
occur. Yet most of the time, interrupts 
may not occur that close together, so the 
bad result is said to be nonrepeatable. This 
means that repeated runs of the program 
can’t be used to systematically close in on 
the bug. 


The Timing Bug: This 
most subtle of all bugs 
spends most of its time 
relaxing, and suddenly 
taking a swipe at appar- 
ently random times. 


A Timing Bug can also live on direct 
memory access (DMA). Some mass 
storage devices can read or write data in 
bulk without the intervention of the 
processor, using those memory access 
cycles which the processor doesn’t use. 
The length of time a DMA transfer will 
take is, at best, very difficult to predict; 
so a Timing Bug can strike if memory 
which is accessed by DMA can be accessed 
or modified by the processor. 

Since Timing Bugs are so hard to hunt 
down, extra efforts should be made to avoid 
giving them a foothold. Be extra careful in 
writing interrupt handlers or DMA com- 
mands. Watch for places where interrupts 
need to be disabled. As for the indentifica- 
tion of Timing Bugs, the following rule is 
useful: if you can prove, in a precise instruc- 
tion by instruction study, that what 
happened couldn’t possibly have happened 
from the execution of those instructions, 
suspect a Timing Bug; something else was 
happening during the execution of those 
instructions. 

Incidentally, it’s possible to encounter 
bugs much like Timing Bugs even without 
interrupts or DMA. An input or output 
device, such as a keyboard, is asynchronous 
with the program; the exact behavior of the 
program will depend on the behavior of 
these devices. For instance, a program 
which accepts keyboard input and accu- 
mulates it in a buffer may work fine for 
you, yet a faster typist may make it fail 
because no provision was made for the 
chance of exceeding the buffer’s capacity. 
But in a situation like this, it’s at least 
possible to look at every call to an input 
routine and tell what its effects might be. 

This completes our survey of important 
species of bugs (I have nothing useful to say 
about the Common Typo, though it does 
have to be fought). Others will no doubt 
discover voracious breeds which I have 
overlooked, and perhaps they will improve 
on some of the classifications | have men- 
tioned. But knowing about the species 
which are listed here will hopefully be 
a help in identifying and killing the bugs 
in your own programs. 








Branch Bug: Jumping 
blindly about in memory, 
the Branch Bug is always 
on a collision course with 
valid execution of a 


program. 
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This doesn’t mean that classifying 
bugs is all there is to entomology, neither 
the biological kind nor the kind being 
discussed here. Entomology wouldn’t 
be a science if it couldn’t say things that 
are true of all bugs, regardless of species. 
What I have discussed so far is differentia- 
tion; but integration is equally important. 

The basic fact that unifies all bugs is the 
one which | mentioned at the beginning of 
this article: they’re all creations of the pro- 
grammer. And this fact allows the use of a 
broad-spectrum killer against all bugs: 
DDT, standing for Design, Documentation, 
and Testing. Let’s take them in order: 

@ Design. The best way to stay bug-free 
is to write programs without bugs. This may 
sound like superfluous advice, but pro- 
grammers (myself included) are often 
tempted into writing programs quickly, 
rather than writing them well. The attempt 
usually fails, since such programs will 
usually cost more in debugging time than 
the time saved in writing them. 

An error born of pragmatism is to 
suppose that it doesn’t matter how you 
design a program, as long as it works. 
There are two problems with this idea. 
The first is that if you use any method 
that appears to do the job, without 
regard for well organized design, it will 
be a lot harder to ever make the program 
work. The second problem is that even if 
the program works for its immediate pur- 
pose, it will be harder to make changes to 
meet new needs, since a particular ad hoc 
solution may not be generalizable. 

The first step in designing a program is 
to lay out a complete plan of attack before 
writing it. Decide what data structures you 
will need, and what method you will use. 
Data structures are often the key to the 
whole program. First plan the program in a 
few large steps; then decide what each step 
will consist of in more specific terms; then 
repeat the procedure until you’re down to 
the level of your chosen programming lan- 
guage. This is the principle of structured pro- 
gramming, and also of mental unit-economy: 
avoid having to think about more things at 
once than your mind can handle. If you can 
keep everything relevant to a particular 
operation in your head, you’re not likely to 
put bugs into its implementation. 

Flowcharting is often recommended for 
program design, but it’s cumbersome and 
doesn’t lend itself to representing a hierar- 
chical design. Another approach is to use a 
well designed programming language, such as 
ALGOL or APL, to write the design. Since 
you aren’t actually going to run the program 
in that language, you can assume any fea- 
tures that would make the job easier. The 


point of this is to have a representation of 
the program that you can understand with- 
out strain, so that you don’t lose sight of 
your overall plan while chasing down details 
of implementation. If you do have bugs 
after doing this, at least they won’t be part 
of the whole design of the program. 

@ Documentation. The main reason for 
writing up the way a program works isn’t to 
explain it to someone else; it’s to make sure 
you understand it yourself. Documentation 
shouldn't be an afterthought; it should begin 
with the design of the program (when you 
write what it is going to do), and continue 
with comments written along with the 
instructions. 

Good documentation isn’t found in sheer 
number of comments (though there should 
be a lot); it’s found in comments that ex- 
plain the operation of the program. Com- 
ments are especially needed for data, sub- 
routines, and points reachable by jumps. 
Variables and constants should be explained 
so that the reader will see how they can be 
used; this allows us to spot threats to 
them, such as Mismatched Units and Clob- 
bered Values. If the language allows, give 
constants names rather than using their 
numeric values throughout the program; 
this makes updating easier and renders 
the Common Typo’s attacks more con- 
spicuous. Subroutines should be prefaced 
with a description of how they are called, 
what inputs are needed, what values are 
returned, and what information may be 
destroyed in the process. Jump points 
should have an explanation of the con- 
ditions under which they are reached. 

To make a program at least partly self- 
documenting, the name of a routine or 
variable should indicate its use. One of the 
major weaknesses of BASIC is that it doesn’t 
allow this to be done very much; this is a 
reason for having a lot of comment state- 
ments to explain what BASIC variables 
and subroutines are used for. 

Just as a sample, here’s a preface to a 
hypothetical 8080 assembly language sub- 
routine (see box). The comments explicitly 
define linkage conventions. 

The protection provided against Botched 
Calls should be obvious. 


®@ Testing. If you follow the approach 
outlined so far, you'll have a better chance 
of getting your program to work, but you 
may still have planted a few bugs inadver- 
tently. So you have to test the program 
before declaring it bug-free. Testing 
should begin with a simple version of the 
program, if possible; but it should begin 
only after the program has been written 
with enough care so that there’s a chance 


of not finding any bugs. 

Use whatever debugging tools are avail- 
able. High-level languages will usually pro- 
vide useful information when the program 
goes wrong. Versions of BASIC that allow 
single statements to be executed make it 
possible to find something about the 
conditions under which an error occurred. 

When working in machine language, a 
debugging program will ease discovery of 
bugs. Such a program allows the user to 
put breakpoints into the program being 
tested (returning control to the debugger 
when the program counter reaches a certain 
address) and to examine and modify regis- 
ters and memory. These programs range 
from simple 1 K monitors to powerful! 
symbolic debuggers like Digital Equipment 
Corporation’s DDT (Dynamic Debugging 
Tool, no relation to the name as used here). 
Having one of these in ROM can be a 
tremendous help. 

If the program works the first time, try it 
again with different data to make sure. 
Check out simple cases. Sometimes a pro- 
gram will work in complicated cases, but be 
bitten by the Fencepost Bug in simple ones. 
Check out more complicated cases. If 
possible, use a random number table as a 
source of test data, along with handpicked 
cases. 

If the program doesn’t work the first 
time, try it again with different data. Aim 
for the simplest case possible. {f you cah 
get the program to do something right, 
that will cut down the number of places 
where bugs may be lurking. 

When a program is being tested, the work 


is easiest if execution comes to a screeching 
halt as soon as something goes wrong. A 
program may be able to run a while after 
crucial damage has occurred, only to 
clobber all of memory before stopping. 
If this happens, it can be almost impossible 
to localize the source of the disaster. But 
if the program makes periodic checks for 
error conditions (such as impossible values 
or invalid relationships) and reports them, 
there’s a better chance of discovering just 
where things went wrong. For instance, 
a routine that fills a block of memory 
between two addresses might check to make 
sure that the low address is really lower 
than the high address. Redundant tests 
may slow down the program, but they 
can be taken out when all the bugs are 
known to be dead. 

The overriding consideration to remem- 
ber in the use of this Design, Document and 
Test technique is that it’s open-ended. it 
will, in principle, kill any kind of bug; but 
a new approach to design, a better scheme 
of documentation, or a novel test may be 
needed for subtle species. Approaching 
bugs scientifically means thinking about 
them. It means recognizing that any bug 
will have important similarities to pre- 
viously encountered bugs; and that it may 
have equally important differences. So 
when you find yourself struggling to dis- 
cover what’s wrong with a program whose 
behavior is incomprehensible, you can 
console yourself with the thought that you 
may be about to make an exciting entomo- 
logical discovery that you can use repeat- 
edly.s 


COMPUTE PROBABILITY OF WIDGET BREAKAGE 
INPUT — MASS OF WIDGET (GRAMS) IN REGISTER PAIR BC 


AGE OF WIDGET (DAYS) IN REGISTER PAIR DE 
OUTPUT — PROBABILITY OF BREAKAGE (PERCENT) IN REGISTER PAIR BC 
ALL OTHER REGISTERS ARE CLOBBERED 





71 


PROGRAM DETAILS 


74 


About This Section 


This section deals mainly with one of the more difficult aspects of a program’s structure 
tables. For any but the most elementary applications the programmer finds that he (she) needs 
to construct some kind of table for a variety of purposes: branching, symbols, data. In fact, 
note that virtually any file of data can ultimately be thought of as a table. This section should 
answer many of your questions about a variety of tables. 

The second topic covered in this section is how to create and maintain binary trees. This 
subject has a reputation which scares a Jot of people from using trees. But when working with 
large amounts of unsorted data, many times the fastest way to reference any particular piece of 
it is by arranging it using a binary tree approach. Now there is no longer anything to fear about 
binary trees. 


An Introduction to Tables 


The construction and use of program 
tables is the gateway to developing powerful 
programs. The new programmer may have 
trouble getting to know the concept of 
tables, but time spent learning about tables 
is well worth the effort. 

The first few programs to go into your 
home computer are likely to be written 
using a multitude of IF tests: If a value 
equals 1, branch to a particular routine; if 
equal to 2, another branch; if over 5, yet 
another branch; and so on. After a while 
this gets to be a lot of work. Programmers 
quickly learn to use table structures to 
simplify decision making. 

Tables are called by many names, de- 
pending on the language and the application: 
arrays, vectors and matrices, to name three. 
Even the concept of a “‘file” is usually just a 
large table which follows the same structural 
rules but is stored on disk or tape. 


Table Elements 


Most of the tables we meet in books, 
forms and so on consist of data arranged 
in rows and columns. Each row usually 
contains a record about something. Name, 
address, age, phone number might be the 
record of a schoolmate. Each item of this 
record, such as name, is called a field. In 
most cases, each record contains the same 
number of fields; this is called a rectangular 
table because of its appearance when 
printed, and is by far the easiest type to 
handle. 

Rows and columns can be interchanged, 
of course, by laying the table on its side. 
Let’s look at two ways to encode this small 
table: 


Name Age Phone 
Joe 14 515-3838 
John 18 216-3001 
Pete 17 414-3377 


First we could encode each line this way: 


F James Butterfield 


record1  field1 Joe 
field2 14 
field 3 5153838 


This is the most common, and usually the 
handiest way to set up the table. It’s logical, 
easy to change or to add new items, and not 
difficult to program a search routine for. All 
the data for a particular line of the original 
table is in one record. However, during this 
search, we must leap 12 bytes or so each 
time we wish to examine a new record. This 
may or may not be convenient to do, de- 
pending on hardware characteristics. By 
laying the table on its side, we could write: 


record! field1 Joe 
field 2 John 
field 3 Pete 

record2 field! 14 
field 2 18 
...etc 


This method is in some ways like de- 
voting a separate table to each kind of data 
in the big table: a table of names, a table of 
ages, etc. This type of organization might 
make it a little easier to search for a name, 
but it becomes tougher to add a new name 
to the list, and harder to read. But either 
way works. 


Order of Items 


One of the most important decisions you 
must make in designing a table is how to 
order the records. For small tables it 
doesn’t matter very much. But as tables get 
bigger, it becomes important not to waste 
time on lengthy searches. 

At first glance, the simple answer is to 
put the most often used items at the top of 
the table where they’ll be found first, a pro- 
cedure which frequently works well. But 
you must know roughly how often each 
table item is likely to be used. If the usage 
pattern changes, your table lookup becomes 
inefficient. Beware of elaborate schemes to 
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rearrange the table order as usage changes: 
they can quickly use up more time than they 
save. 

An excellent method for ordering tables 
is to use the table address itself as the item 
to be matched. Let’s clarify this with an 
example. Suppose we have a character in 
Baudot (5 level) code that we want to trans- 
late, say, to ASCII. The lowest value possible 
is blank, or OO000 (decimal zero). The 
highest value is the letters shift, or binary 
11111 (decimal 31). If we add this char- 
acter, as a binary number, to the table base 
address, we’ll create an address ranging from 
TABLE+0 to TABLE+31. In each of these 
table locations, the corresponding ASCII 
character will be stored. We’d have to make 
provision for both upper case and lower case 
Baudot, of course. The important thing 
about this kind of table is that we never have 
to search it. We go straight to the address 
we want. 

The most common way of ordering items 
in a table is sequential, ie: in ascending or 
descending order, alphabetically or numeri- 
cally. Usually we must pick one particular 
field for the sequence, the one we expect 
to search most often. 

We get many advantages when we have a 
sequential table. The program can detect 
right away if it has “gone past” the item it’s 
looking for, so that it won’t waste time 
searching through the rest of the records. 
With a little more programming effort, we 
can write a binary search program that 
passes through a table very quickly. The bi- 
nary search routine works by examining the 
middle of the table and deciding if the de- 
sired item is above or below this point. From 
then on, the program concentrates exctu- 
sively on the remaining half of the table, and 
looks at its midpoint in the same way. Each 
step cuts the remaining portion of the table 
in half; eventually the desired location is 
found or a conclusion of “no match” results. 

A sequential table is the only type that 
can be used for a continuous value calcuta- 
tion. You may recognize the following par- 
tial table: 


Income 
less than 2350 
less than 2375 
less than 2400 


anog 


This table associates a continuous value, 
income, with unique tax amounts. If your 
income was $2378.54 you do not escape tax 
because there isn’t an exact value of 
$2378.54 in the table. For your program to 


find such an intermediate value, the table 
must be sequential. 

There are several drawbacks to sequential 
tables. The first is the problem of getting the 
table in sequential order and keeping it that 
way during deletions and additions. The 
second is that only one field is in sequence. 
This means that the user may have to re-sort 
the whole table to start searching on a new 
field. 


Advanced Techniques 


When it is desired to arrange a table in 
some order, there may be some difficulty 
moving the items around, especially if they 
are large and clumsy. 

One way to get around this is to leave the 
data in its original order and build a separate 
table called an index which gives the order 
in which the data should be read. This way, 
instead of moving the data around, the index 
is simply changed as necessary. 

Another way to achieve a similar effect is 
by chaining. This attaches an extra field to 
each record which points to the record to be 
looked at next. The program must have a 
Starting point that tells which record is to 
be examined first. From then on, the pro- 
gram follows the chain to the last record. 

Indexing and chaining are both relatively 
complex, but they have one important ad- 
vantage: the same file can have two indices 
or two chains so that it is simultaneously 
sorted two different ways. This feature can 
sometimes eliminate many time-consuming 
sorts. 

Tables which are not rectangular are a 
source of difficulty. If we are recording, 
for example, names of parents and their 
children, we soon face the problem of 
some parents having only one child, while 
others have seven or more. Should we allow 
seven slots for each set of parents and waste 
precious memory? We could build a complex 
table structure to allow for a variable num- 
ber of fields (children). This is practical, of 
course, but sometimes we can eliminate the 
problem by making the table into a list of 
the children rather than the parents. 

Another special case which is often 
encountered is the triangular table, which 
resembles a square split along the diagonal, 
with the two halves containing the same 
numbers. For example, if you calculate a 
table of mileages between cities, you don’t 
need to store both the Buffalo to Denver 
and the Denver to Buffalo mileages; they 
are of course the same. But trying to store 
only half the table to save memory turns 
out to be a difficult task. You’ll need a 
medium sized program to get to the right 
spot in the table. 


Access 


The addressing modes of your machine 
warrant study to determine the best way 
to scan tables. If you have a hardware index 
register, that’s usually the best way both in 
terms of speed and programming con- 
venience. Each microprocessor has its idio- 
syncrasies. An 8 bit index will only cover a 
table size of 256 locations. Sometimes, 
though, an index doesn’t modify a full 
address, but only an 8 bit offset. In this case 
the index must hold a full address rather 
than a simple table position. How easy is the 
index to modify as you step through the 
table? An increment command that adds 
one to the index value is of limited value 
if you want to jump 12 locations at a time. 

If indexing isn’t convenient for a given 
job, indirect addressing is the next best bet. 
Put the address of the start of your table 
into an indirect address location; then add to 
it as necessary until you reach the end of the 
table. 

Don’t hestitate to search a table back- 
wards if it’s convenient. This facilitates 
searches when using certain types of in- 
dexing. 


Program Intercommunication 


One program segment can communicate 
with another by means of tables. In fact, 
processors which feature a common memory 
use this technique. When working with an 
interrupt structure, the recommended pro- 
cedure is to have one program prepare a 
table of material for another to pick up. 
This becomes a good way to segment large 
projects into convenient modules. Each 
module can be separately debugged by 
preparing a set of test input tables and 
examining the output tables it produces. 
On very large jobs, this kind of segmen- 
tation is an excellent way to divide work 
among several people. Even online debugging 
becomes easier, since the tables can be 
readily viewed at any time. 


Conclusion 


Tables are a good way to arrange data 
in a compact, visible and easy to modify 
form. New programmers sometimes have 
problems getting used to designing and 
using them, but they are well worth the 
effort.a 
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Hashing is the meat and potatoes of symbol table 


handling. 


A Note About Notation 


The routines described 
in this article are repres- 
ented in two notations. 
Figures 1 through 5 show 
the various algorithms in 
the Warnier-Orr structured 
programming discipline. 
This notation is more fully 


described in David Higgin’s 
articles in the PROGRAM 
STRUCTURE section of 


this edition. Listings 1 
through 5 provide the 
author’s _ corresponding 
8080 assembly language 
versions of the pro- 
grams... .BL 
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It is often necessary to convert alpha- 
numeric code into numeric code efficiently. 
This article describes how to do this using a 
powerful data structure called the hashed 
symbol table. Assembly language code is 
included for the 8080 microprocessor, but 
the algorithnis and structures apply to any 
computer. 

A symbol table is a set of ordered pairs 
called entries. The first element of each pair 
contains a symbol (usually in ASCII) and the 
second contains the object the symbol 
represents. There are three operations which 
are applied to a symbol table: 

@ Lookup (also called search): An input 
symbol, called the key, is compared to 
the symbol in an entry of the symbol 
table. When a match occurs, the object 
associated with the symbol is output. 
If no match occurs, this condition is 
indicated; 

@ Insert: An entry is appended to the 
symbol table; 

@ Delete: An entry is removed from the 
symbol table. 

A structure to make these operations easy 

and efficient is the object of this article. 


LEXCMP 
(1,9) 


BEGIN {= key 


Hashed Symbol Table 


John Beetem 


Lookup 


Lookup, if done wrong, can be a very 
time consuming operation. The most funda- 
mental lookup structure is a simple array 
where the entries are placed sequentially in 
memory. If the number of entries is large, 
lookup is quite slow because the key must 
be compared to half of the entries on the 
average. A sorted array of entries can be 
searched by methods such as a binary search, 
which is considerably better (and much 
more complicated.) But the best method 
seems to be one called hashing. 

A hashed symbol table consists of many 
arrays of entries, called buckets (my system 
uses 64 arrays). Each element in a bucket 
has the same hash code for its symbol. A 
hash code is computed from the symbol 
itself using a pseudo random method, such 
as adding the binary representations of all 
the characters in the symbol and using the 
low order six bits of the result. Using a good 
hashing method, the symbols are well 
distributed over the buckets, and each 
bucket is fairly short. 


get character to compare 


compare characters 


equal 


(0,1) {ee not equal flag 


BEGIN 
get next character 


equal } last character in string {eet equal flag 
(0,1) (0,1) 

last character in string {ski 

END 


Figure 1: LEXCMP compares two ASCII strings. My format assumes that the 
last byte of each string is marked by its sign bit being set. The strings can be 
of any length, and don’t have to both be the same length, Equality of strings 
Is indicated by returning with the match flag set. Otherwise the match flag is 


cleared. 


To lookup a key, the following algorithm 
is used: 

@ Compute the hash code; 

@ Use the code to find the bucket; 

@ Use a simple sequential search through 

the bucket to match the key. 

Since each bucket is short, this is an 
efficient way to perform the lookup. 

To insert or delete an entry, first hash the 
symbol to find the right bucket, then insert 
or delete the entry into or from that bucket, 
This last operation is dependent on bucket 
structure, and will be discussed presently. 

Storage could be a problem. Would it 
make sense to have 64 different arrays, one 
for each bucket? No, because one bucket 
could become filled while others are empty, 
and it’s silly to run out of space when there 
is plenty left. So it would be nice to store all 
the entries in the same array in memory. 
How does one indicate the bucket structure? 


Linked List 


A linked list consists of a group of things 
called nodes. Each node contains data and 
one or more pointers to other nodes. (A 
binary tree is a linked list.) This structure is 
used to solve our problem as follows: 

Each node contains a symbol table entry 
and the sixteen bit address of the next node 
in the same bucket. There is also a 128 byte 
array containing the sixteen bit addresses of 
the first node in each bucket. An address 
such that the high byte is zero indicates that 
there are no more nodes in that bucket. The 


LEXCMP: LDAX ;Load A with character addressed by BC. 
CMP s;Compare with character addressed by HL, 


RNZ jlf not equal, return with zero flag clear. 


“INX B ;Advance to next character in each string. 
INX H 
ORA A ;If not last character in both strings, 
JP LEXCMP jThen continue comparison, else: 
XRA A Set zero flag and clear carry flag. 
RET ;Return, 


Listing 1: Subroutine LEXCMP [LEXical CoMPare] compares two ASCII 
strings. The addresses of the beginning of the strings are stored in the HL and 
BC registers. The last byte of each string is marked by its sign being set. The 
strings can be of any length, and don’t have to be the same length. Equality 
of strings is indicated by returning with the zero flag set, otherwise the zero 
flag is clear. 


BUCKET: [block of 128 bytes, initially zero.] 


HASH: XRA A Clear A. 


XRA M ;XOR next character in string. 

INX H ;Advance to next character. 

JP HASH+1 jlf not last character, continue hashing. 

ANI 3F jUse low 6 bits of result as hash code. 

MVI H,00 ;Load HL with hash code. 

MOV LA 

DAD H ;Double hash code since addresses are two 
bytes long. 

LxXt D,BUCKET jLoad DE with address of bucket pointer 
array, 

DAD D jHL Contains address of the pair of bytes 
xontaining the address of the first node 
jin the bucket. 

RET jReturn, 


Listing 2: Subroutine HASH computes the hash code of the symbol “in’’ the 
HL registers by exclusive OR’ing the characters in the string. 

HASH returns the hash code in A, and the address of the address of the first 
node in the bucket in HL. 


get first character in symbol 


BEGIN ie register 


BEGIN 


offset 
(1,n) 


HASH 


END 


perform exclusive OR operation between register and character 


get next character 


last character { SKIP 


(0,1) 
@® BEGIN 
use low order 6 bits as hash code 


double hash code 
last character 
(0,1) get beginning address of BUCKET 


add beginning address to hash code to get hash address 
END 


Figure 2: HASH computes the hash code of the symbol by exclusive-OR’ing 
the characters in the string. HASH returns the hash code and the address of 
the address of the first node in the bucket. 
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BEGIN { save key 


HASH { hash symbol to get first node in bucket 


BEGIN 


get address of next node 


address of next node = 0 foe no match flag 
(0,1) 


BEGIN 
LOOKUP loop © LEXCMP {compare key and symbol 
(1,n) 
matched {ert next node 
address of next node =0 (0,1) 
(0,1) ® 
matched {xr 
(0,1) 
END 
END 
END 


Figure 3: LOOKUP searches for the key symbol in the symbol table. When 
the symbol is matched, the address of the parameters represented by the 
symbol are returned and the match flag is cleared. If the symbol cannot be 
found, LOOKUP returns with the match flag set. 


LOOKUP: MOV B,H sSave address of key in BC, 

MOV C,L 

CALL HASH jhash symbol: HL contains address of the 
address of the first node in the bucket. 

LOOP: MOV E,M jLoad DE with address of next node in list. 

INX H 

MOV DM 

MOV A,D jLoad A with high byte of address 

ORA E ;lf DE is zero (symbol not matched) 

sTc jthen (set carry. 

RZ : Return.) 

MOV #H,D jLoad HL with address of node. 

MOV L,E 

INX H Span (pass over) 2 byte field containing 

INX H address of next node in list. 

PUSH B jSave address of key. 

CALL LEXCMP s;Compare key to symbol in node. 

POP B ‘Restore address of key into BC. 

RZ ;Return if successful match. (Carry cleared 
jby LEXCMP) 

XCHG jLoad HL with address of address of next 
node in list. 

JMP LOOP ;Continue search through bucket. 


Listing 3: Subroutine LOOKUP searches for the symbol “in” the HL registers 
(the key) in the symbol table. On the symbol's first match, the address of the 
parameters represented by the symbol are returned in HL with carry clear. If 
the symbol cannot be found, LOOKUP returns with carry set. 
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logical structure of memory (as the program 
sees it) is different from the physical struc- 
ture of memory. The buckets are stored in 
the same region of memory and there is no 
“crosstalk” between buckets. 

Symbols consist of a variable length array 
of bytes containing 7 bit ASCII characters. 
The last character in the symbol is indicated 
by the sign bit being set, whereas the other 
characters have the sign bit clear. This is 
necessary so that both ends of the symbol 
are known. 

A complete node consists of: 

@ 16 bit address of the next node 
byte 1 contains the low 8 bits, 
byte 2 contains the high 8 bits 

@ n bytes of symbol in ASCII. 

@ m bytes of parameters represented 
by the symbol. 

We are now ready to look at some code. 


The Routines 


Subroutine LEXCMP (figure 1) is used to 
compare two ASCII character strings. The 
strings need not be of equal length. 

LEXCMP works as follows: the first 
two characters are compared. If they are not 
equal, LEXCMP returns with a not equal flag 
set. If the first two characters are equal, 


LEXCMP checks if those were the last 
characters in the strings. If so, LEXCMP 
returns with the not equal flag set; other- 
wise, the next two characters are compared, 
and so on. 

Subroutine HASH (figure 2) computes 
the hash code of a symbol. HASH then 
computes the address of the pointer to the 
correct bucket. 

HASH uses an exclusive-OR function to 
hash the characters. This makes it very easy 
to detect the end of the symbol, as only the 
last character will set the sign bit in the A 
register. BUCKET is the starting address of a 
128 byte array containing the addresses of 
the first node in each bucket. 

Subroutine LOOKUP (figure 3) searches 
for a key. If the key cannot be found in the 
table, LOOKUP returns with a not found 
flag set; otherwise, LOOKUP returns the 
address of the parameters associated with 
the symbol, and clears the not found flag. 

Subroutine INSERT, shown in figure 4, 
inserts a node into the symbol table. This is 
very easy to do using the linked structure. 
Only two addresses must be moved. The first 
node in the bucket is linked to the new 
node, and the address in the BUCKET array 
links to the new node; thus, the new node 
becomes the first node in the bucket. 

Subroutine DELETE, figure 5, removes a 
node from the symbol table. DELETE 
requires that the node to be deleted was the 
last node inserted into the bucket. This is 
not as severe a limitation as one might think. 
(In a compiler such as ALGOL, symbols go 
out of existence in reverse of the order they 
came into existence; exactly what goes on 
here.} This limitation simplifies the DELETE 
operation considerably, and also simplifies 
reclamation of space (reusing memory freed 
by deleting nodes.) 

DELETE moves the address in the link 
field of the first node in the bucket into the 
proper element of the BUCKET array. Thus 
the second node in the bucket becomes the 
first node. Notice that the INSERT and 
DELETE operations are exactly PUSH and 
POP operations, where the stack is organized 
as a linked list instead of a contiguous array. 

This set of routines could be the basis for 
any symbolic data handling program. The 
structures are not limited to ASCII and 
could be used for any code, for example, a 
phonetic language. This system is the sym- 
bolic backbone of a compiler or interpreter: 
LISP could be kept very happy. Notice that 
you can have a symbol defined many times: 
The most recent assignment is valid (it has 
“extent’’), yet the older ones still exist (they 
have “‘scope”). This is caused by this inser- 
tion and deletion method, and is the basis 
for block structured languages. 


BEGIN get symbol 
HASH {hash symbol to get first node in bucket 
get address of new node’s link field 
INSERT 
get address of first node in bucket 


make new node first node in bucket 


END 


Figure 4: INSERT inserts a@ node into the symbol table. The new node 
becomes the first node in the bucket. 


INSERT: PUSH H save address of new node. 

INX H yreak (move up to) to symbol. 

INX H span link bytes. 

CALL HASH jhash symbol: HL contains address of address 
sof first node in bucket. 

POP D ;Load D with address of new node’s link field. 

MOV AM ;Get address of first node in bucket. 

STAX D Set link of new node to point to first node in 
sbucket. 

MOV M,E ;Make new node first node in bucket. 

INX H ;Do above to second bytes of addresses. 

MOV A,M 

MOV M,D 

INX D 

STAX D 

RET sReturn. 


Listing 4: Subroutine INSERT inserts the node addressed by HL into the 
symbol table. The new node becomes the first node in the bucket. 


BEGIN(et address of second node 
DELETE HASH {hash symbol to get first node in bucket 
make second node first node 
END 
Figure 5: DELETE removes a node from the symbol table. This node must be 


the first node in the bucket. The second node in the bucket becomes the first 
node. 


DELETE: MOV C,M ;Load BC with address of second node. 


INX H 

MOV B,M 

INX H 7 

CALL HASH ;Hash symbol: HL contains address of address 
sof first node in bucket. 

MOV M,C jMake second node in bucket the first node in 
ybucket. 

INX H 

MOV MB 

RET ;Return. 


Listing 5: Subroutine DELETE removes the node whose address is in HL 
from the symbol table. This node must be the first node in the bucket. The 
second node in the bucket becomes the first node. 
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Address 


0000 
0002 
002c 
002E 
007E 
1000 
1002 


1004 
1006 


1008 
100A 
100C 
100E 


1010 
1012 
1014 
1016 


1018 
101A 
101C 
101E 


1020 
1022 
1024 


Data 


0000 
0000 


0000 
0000 


0000 


0000 
‘BA 
‘T,00 
1234 


0000 
‘C/A 
‘T,00 
4688 


0000 
‘ASC 
'T,00 
AFOO 


0000 
TAA 
‘B,0O 
6878 


0000 
T/A 
‘C,00 
7897 


Initial 


BAT 


0000 
0000 


0000 
1000 


0000 


0000 
‘B/A 
‘T,00 
1234 


0000 
‘C/A 
*T,00 
4688 


0000 
‘A/C 
‘T,00 
AFOO 


0000 
‘T/A 
"B,00 
6878 


0000 
'TUA 
‘C,00 


CAT 


0000 
0000 


1008 
1000 


0000 


0000 
‘BA 
‘T,00 
1234 


0000 
‘CA 
‘T,00 
4688 


0000 
‘A/C 
'T,00 
AFOO 


0000 
THA 
’B,00 
6878 


0000 
‘T/A 
‘€,00 


ACT 


0000 
0000 


1010 
1000 


0000 







































This is one method of storing a hashed symbol table. Assume that the 
symbols we will be working with are already in memory starting at hexa- 
decimal address 1000. Each entry consists of three parts: 

@ the address of the symbol which follows this one in the bucket; 

®@ the symbol itself; 

® the value represented by the symbol. 

For the purposes of this discussion, the symbols are assumed to be four 
characters long and followed by a two byte value. 

Column one in table 1 shows how memory is initially arranged. BUCKET 
occupies memory from hexadecimal address 0000 to OO7E. This allows 
arrangement of 64 buckets, All pointers in the symbol and hash tables are 
initialized to zero. A pointer value of zero indicates that this symbol is the 
last one in that particular bucket since that address is not defined in the 
symbol table. 


Insertion 


The first symbol that will be entered into the BUCKET table is BAT. The 
symbol BAT hashes to hexadecimal location 002E. The pointer at address 
002E is 0000. This means that there are no symbols in this particular bucket. 
We now set this location equal to hexadecimal 1000 which is the address of 
the symbol. 

The next symbol we wish to insert is CAT. This symbol hashes to hexa- 
decimal location 002C. Since this location is also equal to zero, indicating no 
symbols in the bucket, we point it to CAT at location 1008. 

The third symbol to be inserted (ACT) hashes to the same location that 
CAT did since it contains the same letters. Since there are already symbols in 
this bucket, we search the entire bucket to make sure that this symbol is not 
already contained within the bucket. Since it is not, we will place it at the 
head of the bucket. This is done by having the first pointer in the bucket 
point to this symbol (hexadecimal address 1010). The pointer that ACT has is 
adjusted to hexadecimal address 1008 to point to CAT. CAT’s pointer is still 
0000 indicating that it is the last symbol in the bucket. 


Deletion 


The particular format that we have adopted requires that any symbol that 
Is to be deleted must be the first node of a bucket. This implies that it was 
the last symbol added to that bucket. 

Suppose we want to delete the symbol ACT. ACT Is the last symbol that 
was inserted into the bucket located at hexadecimal address 002C. If the 
pointer at this location is changed to point at the second symbol in the 
bucket, ACT is effectively eliminated from the hash table. 

If this method is employed, the minimum number of pointers need to be 
changed when inserting or deleting a symbol. Insertion requires that the 
pointer at the head of the bucket point to the new symbol location, and the 
new symbol points to the node that used to be at the head of the bucket. 
Deletion requires changing only the pointer at the head of the bucket from 
the first node to the second node of the bucket. @ 
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Figure 1: An Opcode Table Organized for 
Direct Access. Note that with this particular 
organization the first data byte of each entry 
is related to the address of the entry within 
the table, in a sorted sequence. 


TABLE: 
+4: 
+8: 

+12: 
+16: 
+20: 
+24: 
+28: 





Making Hash With Tables 


Hashing is a technique used to speed up 
table searching operations by making posi- 
tion in the table depend upon the data. 
Many newcomers to programming reject 
hashing as an overly complicated technique 
useful only by the designer of exotic systems 
software, but this is not the case. Any large 
program, written for fun or profit, may 
include tasks of accessing, storing, or modi- 
fying entries in a table or array. Most game 
playing programs include a number of such 
tasks. Application of hashing techniques can 
often dramatically improve the performance 
of these programs. This article will explore 
the use of hashing (sometimes called key-to- 
address transformation) as a simple but 
effective mechanism for accessing stored 
data. These techniques can be used in 
applications where the data is organized 
randomly and where each item has a unique 
key associated with it. For example, con- 


Listing 1: Typical 8080 code sequence for a linear search of a table until the 
first byte of the current table entry matches the value in the accumulator. In 
this listing, the HL register pair must be preset to the address of the table, the 
DE register pair must be set with the number of bytes per table entry, the B 
register must contain the number of entries to search (maximum 255) and the 
key value sought must be loaded in A. This is by no means the only possible 
8080 linear search strategy. 


FIND: CMP M Check for a match; 
RZ If so then exit; 
DAD D Advance to next table entry; 
DCR B Decrement count; 
JNZ FIND Continue till end; 
JMP ERR Table exhausted, treat as error; 
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sider a table that contains computer opcode 
mnemonics and their associated value as 
used in an assembler; by using the opcode 
value as a key this table could be used to 
determine the mnemonic associated with 
any particular value. Such a table is an 
integral part of any disassembler. 

In any computer, a particular entry in a 
table can be specified by the starting address 
of the entry. Locating an item in a table 
implies that the starting address for that 
item must be determined. One possible 
method that can be used to determine the 
address, and by far the most common 
method, is to examine each item sequen- 
tially, starting with the first item, until the 
desired item is located and hence the item 
address determined. This approach is termed 
a linear search and as you can see by the 
the 8080 subroutine of listing 1, it is simple 
to code. The big disadvantage of a linear 
search is that it is costly in terms of 
processing time because, on the average, at 
least one half of the table entries must be 
examined before locating the desired item. If 
the table is moderately large and numerous 
accesses are required, then the table lookup 
processing time will constitute a significant 
part of the total processing time. 

An alternative to the linear search in- 
volves storing the information in a sorted 
fashion based upon the key. However, even 
the best known algorithms for locating data 
in a sorted table require an average of Log2N 
tests, where N is the table size. Therefore, 
a table with, let’s say, 500 entries requires an 


:TABLE 


+4 
:+8 
+12 
+16 2 

Figure 2: A Hash Accessed Table. Note that 
+20 with the hash algorithm described in the 
ay text, three elements of this table map into 
: identical starting entries, resulting in a re- 
+28 hash requirement indicated by the arrows 





and dotted lines. 





average of nine tests to locate an arbitrary 
item. Although this is a considerable 
improvement over the linear search, which 
would require an average of 250 tests to 
locate an item, hashing techniques require 
considerably fewer tests than either method, 
without the added burden of sorting. 


The Key 


The fundamental idea behind any hashing 
technique is that instead of searching the 
table to determine the address of a particular 
entry, an attempt is made to calculate the 
address using the key. That is, a subroutine 
is written which, when given any desired 
key, calculates the table location containing 
the item associated with that key. If this 
calculation is successful, then the desired 
item is located with a single search. 

The first step is to determine the key. 
This choice will depend upon the intended 
use of the table. In the opcode table 
mentioned earlier, the opcode value is the 
key since all lookup requests are of the 
form: ‘What is the mnemonic for the 
opcode X?” On the other hand, if this same 
table were incorporated in an assembler or 
compiler, then the mnemonic would be the 
key because requests are now of the form: 
“What is the opcode value for mnemonic 
X?”. In all of our examples, we will assume 
that the opcode value is the key. 


Direct Access Hash 


Imagine that there are only a limited 
number of opcode values and it so happens 
that, although the value is eight bits long, 
the opcode is uniquely determined by the 
rightmost three bits. If a table, called 
TABLE, is created with eight 4 byte entries, 
and the mnemonic and value for each 
opcode is placed in the table entry whose 
address is found by multiplying the right- 
most three bits of the opcode by four and 


adding the results to the base address of the 
table, then a simple subroutine can calculate 
the precise location of any entry. That 
subroutine, shown in listing 2 for an 8080, 
simply strips off the rightmost three bits of 
the key, multiplies them by four, and adds 
in the starting address of the table as shown 
in figure 1. Entries are added to the table in 
the same manner. Tables of this type are 
called direct access and are most commonly 
used for conversions; that is, converting 
from one character code to another, from 
opcode values to mnemonics, etc. In many 
direct access tables the actual key is not even 
stored in the table since a comparison is not 
necessary to determine the proper entry. 


Open Hash 


The direct access method would ob- 
viously break down if certain opcode 
mnemonics were associated with values 
whose rightmost three bits were equal. In 
this case, where direct access is infeasible, 
the algorithm must be slightly modified. A 
subroutine is still used to calculate the 
address, but since it is no longer possible to 


Editor’s Note: 


In this article, we repre- 
sent several algorithms ina 
structured pseudo code 
form appropriate to the 
discussion. These algo- 
rithms are referenced by 


numbers in brackets, as in 
[n] for algorithm n. Each 


algorithm should be 
thought of as a formal 
procedure, which in prac- 
tice would be called as a 
subroutine. 








Listing 2: Typical 8080 code sequence for direct hash with a table of eight 
entries, each entry being four bytes in length. In a direct hash approach, the 
actual data value (in this case, a number from 0 to 7) being sought is used to 


determine the offset in the table directly. 


Here the calculation is made 


according to the formula: ADDR := BASE + 4 * (A & 7) where A is the value 
of the entry being sought, BASE is the starting address of the table, and 


ADDR is the effective address of the table element involved. 3 
FIND: LXI H, TABLE HL:=Table pointer; 
ANI 7 Extract rightmost three bits; 
RLC Multiply by four; 
RLC 
ADD Ll Add the tabte address; 
MOV LA 
MvI A,O 
ADC H 
MOV HA HL:=Entry address; 
RET 
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[1] 1. 1,J:= HASH(KEY); 
2. 
3. 
4. 
5. 
6. 
7. (:=14+1; 
8. 
9. 
10. 
KEY 


Figure 3: Folding Keys. 
When it is desired to retain 
the significance of all the 
bits in a key while com- 
pressing the total number 
of bits used, folding by 
some operation such as ad- 
dition can be used. 
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successfully calculate the location of all 
entries, some type of searching algorithm 
must be employed to pinpoint the position 
of the entry, given the calculated position. 
The initial predicted position of the table 
item is called the hash index and the 
procedure which produces the hash index is 
called the hashing function. For the remain- 
der of our discussion, HASH is used to 
denote that subroutine and therefore 
HASH(K) denotes the hash index for a 
particular key, K. 

Before considering how the information 
is initially entered in a hash table, it may be 
useful to examine the process used to locate 
an arbitrary entry in a hash accessed table. If 
KEY is used to denote the key associated 
with the desired entry, and TABLE, a table 
consisting of N entries (each of which are B 
bytes long), then the algorithm to locate the 
entry that is associated with KEY, using 
hash techniques, is as follows: 


do until (I=-J—1) [worst case end test for search failure] ; 
if @(TABLE + | * B) = 0 then [element not present, search failure] ; 
do; call ERROR; return; end; 
if @(TABLE + 1 * B) = KEY then 
return [the item has been located] ; 


if 1} = N then | := 0 [wrap around table space limit] ; 


end; 
call ERROR [element not present, search failure] ; 


tn this algorithm, specified in a structured 
pseudo code form, step 1 calculates an initial 
estimate of the location of the item associ- 
ated with KEY, the hash index. This value is 
saved in J for the worst case end test in the 
do untif construct of step 2. In steps 3 and 
4, the algorithm tests for a null entry end of 
search criterion and calls an ERROR routine 
if this is detected. Return to the calling 
program follows detection and flagging of 
the search failure condition. Then the algo- 
rithm tests to see if the current entry is 
equal to KEY at step 5; if this condition is 
found, the algorithm terminates with a 
return operation at step 6. Otherwise, the 
next index is calculated at step 7, an end 
around wrap condition is tested at step 8, 
and the do loop is closed at step 9 with an 
end statement. If the loop execution ends 
through the test on line 2, step 10 is reached 
and an error condition is flagged before an 
automatic return assumed after the last line 
of such a procedure. 

Consider again the opcode table example. 
If the hash procedure is defined as HASH(K) 
= REMAINDER(K/8), then each table item 
shown in figure 2 can be located by at most 
three searches using algorithm [1]. 


Defining HASH 


In choosing a hash function, you must 
attempt to define a general procedure, using 
a minimal number of simple computations, 
which produces an even distribution of hash 
indices for a random selection of possible 
keys. If we knew that all op codes were even 
numbers, then the hashing function 
HASH(K) = REMAINDER(K/8) would not 
be efficient, because it will produce only 
even numbers. This simple example illus- 
trates that the hashing function must be 
carefully selected to suit the particular appli- 
cation. It should also be noted that it is not 
necessary for the key to be a numeric value. 
If alphanumeric or other keys are used, the 
hashing function should ignore the data type 
and simply perform numeric or logical ma- 
nipulations of the key as though it were 
numeric, 

One of the most widely utilized, and 
historically the first, hashing function has 
already been mentioned. If N is the size of 
the table (in terms of the number of entries, 
not the number of bytes) the hash index is 
the remainder of the key divided by N. More 
precisely stated, HASH(K) = REMAIN- 
DER(K/N). In a machine such as the 8080 
which lacks division capability, this function 
will be made significantly faster by re- 
stricting the length of the table to a power 
of two (ie: N= 2M). If N= 2M, then the 
REMAINDER (K/N) also happens to be the 
rightmost M bits of K and a divide operation 
is no longer required. The remainder is 
selected by a logical AND operation. 

The remaindering function will not 
produce well distributed hash indexes if 
many of the entries end with the same bit 
sequence. This situation is frequently en- 
countered when dealing with alphanumeric 
data. Changing the table size to a prime 
number usually improves distribution, but 
now we are back to the unwanted divide 
operation for calculating the remainder. 
There are two other alternatives to this 
problem. The first is a technique called 
folding as diagrammed in figure 3. This 
method applies the remaindering algorithm 
to the bit string that is obtained by adding 
the upper half of the internal binary repre- 
sentation of the key to the lower half. This 
minor improvement minimizes the effect of 
patterns that may occur within the key. You 
should be careful what improvisations are 
made to the folding technique. For example, 
substituting a logical AND for the add 
sounds good, but will merely make matters 
worse. If in doubt, try experimenting with 
various keys by examining the effects of key 
value in a test program to grind out hash 
indices. 


A second method for minimizing the 
effect of similar bit patterns in the key, best 
applied to tables of size 2M, is called 
squaring. This consists of selecting the center 
M bits of the number that is obtained by 
multiplying the key by itself. Since the 
middle bits of the product depend upon all 
of the bits in the key, this method generally 
produces a uniform distribution of hash 
indices, 

Since the squaring method is safest, it 
may appear that one should always use it. 
This is certainly not the case because the 
purpose of hashing is to save processing time 
and although squaring is the most general 
technique, it is unfortunately the slowest 
since it relies on a multiply operation which 
the 8080 and many other small processors 
lack. It is often acceptable to settle on a 
slightly less efficient hash function if sucha 
function is substantially faster. The guideline 
for selecting the hash function is to employ 
a more complex function only in those 
specific cases where a simple function fails 
to produce an adequate distribution of hash 
indices. But remember, any hash function is 
better than a linear search. Why? A linear 
search is a hash access where HASH(K)=0 
for all values of K, therefore any distribution 
is better than none. This degeneracy is 
evident in algorithm [1] when the data item 
sought is not in the table, and the algorithm 
searches every location. 


Multibyte Hash 


Until now, we have tacitly assumed that 
the entire key can be contained in one byte. 
This is impractical, and the hashing concept 
is easily extended to cover those cases where 
the key occupies more than a single byte. If 
the key is continued in byte locations 
(Kj, ... Kj) then a multibyte hash function, 
HASHM, can be defined in terms of 
any of the previous hash functions as 
HASHM(K,J) = HASH(K]+ ... +Kj). That is, 
any of the single byte hash functions are 
applied to the sum, ignore carry, of the 
bytes in the multibyte key. As you see in 
figure 4, this is similar to the folding 
technique just mentioned. 

Another possibility for a multibyte hash, 
which should be used with some degree of 
caution since it may not provide an even 
distribution, is to apply a single byte hash 
function to the last byte (or any other byte 
of your choosing) of the multibyte key. This 
eliminates the time required to add the 
words of a multibyte key. As usual, the 
programmer is faced with a time versus 
efficiency tradeoff. 


Guidelines 
In summary, the sole purpose of a hash- 


ing function is to calculate an initial table 
index for a linear search, given a specific 
key. There is no one best algorithm and the 
number of algorithms available is bounded 
only by your imagination. The general guide- 
lines to follow when designing your hashing 
function are: 


1. Keep it simple — Remember, the goal is 
to locate an item in the minimum 
amount of time. If the perfect hash 
requires more time than a linear 
search, it is useless! 

2. Insure an even distribution; beware of 
weird bit patterns in the key. 

3. Check out the operation of the func- 
tion prior to employing it as a hash 
function. There is often an over- 
whelming urge to give it the smoke 
test, but hash indices are used to form 
memory addresses so it may be dif- 
ficult to isolate bugs in the hash 
function after you've incorporated it 
into a table lookup procedure. Save 
yourself some time, check the table 
lookup subroutines first. 


Building the Table 


Obviously, for the hash access algorithm 
to operate smoothly, the table items must 
have been entered into the table properly. 
The relative ease with which entries can be 
made in a hashed table is an important 
advantage of hash techniques. Remember, 
even though a sorted search is reasonably 
efficient for locating an entry, the entire 
table must be sorted before any access is 
allowed. Thus, if accesses were to be inter- 
mixed with entries, the algorithm would be 
grossly inefficient due to the amount of 
resorting required. 





MULTI-BYTE KEY 





COMBINED KEY 





Figure 4: The principle of folding key elements can be extended to a 
multibyte key. The multibyte hashing scheme might be employed where a 


key is a character string field. 
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[3] 


WONTON 


— = 


POMDNAMNAWN> 


Before any entries can be made in the 
hash table, the key field of the table must be 
initialized to some flag value which is not 
encountered as a possible key. If a table 
entry contains this value, then it can be 
assumed that the entry is unoccupied. The 
most common value used to designate an 
empty table entry is the integer zero, and 
assuming this to be the case, the algorithm 
to add an item associated with KEY, to the 
table of N entries (each B bytes long) is: 


IJ := HASH(KEY); 
do until (!1=J—1) [worst case end test for search failure} ; 
if @(TABLE + 1 * B) = O then 
do; 
[enter the item at (TABLE +1 * B)]; 
return; 
end; 
b:=1+1; 
if !=N then !:=0 [wrap around table space limit] ; 
end; 
calt ERROR [no room left in table] ; 


Notice that the lookup algorithm [1] and 
the entry algorithm [2] are very similar in 
nature. The loop control is identical, and the 
only difference is in the actions taken. It is 
quite possible to make an automatic entry 
occur whenever a key is not found as 
indicated by a null key value found during a 
search. The following algorithm combines 
both operations. 


IJ := HASH(KEY); 
do until (I=-J—1) [worst case end test for search failure] ; 


if @(TABLE + | * B) = KEY then 
return [the item has been located] ; 
if @(TABLE + | * B) =O then 
do; 
[enter the item at (TABLE +! * B)]; 
return; 
end; 
P:=14+1; 
if } = N then J} :=O (end wrap around table space limit) ; 


end; 
call ERROR [if this point is reached, table is full) ; 


In addition to adding or locating entries, 
it may also be necessary to delete entries. To 
delete an item, you might think that we 
could merely locate the item and then set 
the table entry to zero, thus making it 
available for future entries. However, if that 
approach were taken, not only would the 
desired entry be deleted, but other entries 
might be made inaccessible. The reason that 
other entries would be lost is that the 
searching terminates when an unused loca- 
tion is found. As an example, setting the 
entry at (TABLE + 20) in figure 2 to zero 
would also make the entry at (TABLE + 24) 





00 
N 
oO 

P 

o1 
L 
D 
R 

02 
s 
T 
R 





Figure 5: Horizontal Or- 
ganization of Tables. In 
this method of organiza- 
tion, all the bytes of a data 
entry are assigned to con- 
tiguous addresses. 
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Table 1: Comparison of Table Access Methods. This table gives the results of 
an experiment with random data to test out the various methods of access. 
The tables were filled to the percentage levels indicated at the left. A table 
size of 500 possible entries was used. The access methods shown are described 


in text. 


inaccessible. Therefore, an alternate scheme 
must be used to delete entries. 

The first step is to select a deleted entry 
flag that is distinguishable from the unused 
entry flag and is also not allowable as a key. 
Then, whenever an entry is to be deleted this 
new value replaces the entry. The new flag 
indicates that the entry is available for 
future additions to the table but does not 





terminate a search operation. If 0 is used to E 
denote an unused entry and —1 is used to A 
denote a deleted entry, then the complete A 
hashing algorithm is: 5 
S 
(4) 1. 1,3 := HASH(KEY); 
2. do until (I=J—1) [worst case end test for search failure] ; 
3. if @(TABLE + | * B) = KEY then P 
4. do; R 
5. if [entry is to be deleted] then [delete the entry] R 
6. @(TABLE + 1 * B) := —1; D 
7. return [item has been located ] ; D 
8. end; T 
9. if @(TABLE + | * B) = 0 then [this is a null entry so] R 
10. do; 
11. {enter the item at (TABLE + B * 1)); E 
12. return; 
13. end; 7 2 tasded 
14. t=144; Figure 6: Vertical Organization of Tables. In 
15. if | = N then | = 0 [end wrap around table space limit] ; this method of organization, a multibyte 
16. end; 


17. call ERROR [if this point is reached, table is full] ; 


This algorithm either locates an item or 
adds the item to the first available location. 
If an item is to be deleted it is first located 
and then the key field of the table entry 
is set to —1. 


Collisions 


A collision occurs whenever 
HASH(KEY1) = HASH(KEY2), but KEY1 
# KEY2. As discussed earlier, a good hash- 
ing function will avoid this condition, but 
the problems caused by collisions cannot be 
ignored. Note for example that the hash 
index for opcodes 04, 24 and 34 in the table 
shown in figure 2 is 4 and hence these 
entries collide. 

What happens when two entries collide? 
The only solution we’ve discussed thus far is 
to search the table, in a circular fashion, 
from the point of impact as in algorithms 
[1] to [4]. If, in general, a collision occurs, 
then the resulting search, good or bad, is 
called a rehash. The process mentioned 
above, namely, searching the table in a 
circular fashion from point of impact, is 
called a linear rehash, and as you might 
expect falls into the bad category. Other 
more efficient algorithms will be discussed 
later. 

If we denote the rehashing algorithm by 
REHASH, then the general hashing lookup 
algorithm may be restated in its final form: 


sole 


table element is treated as “n’’ single byte 
subtables where “‘n”’ is the number of bytes 
in each entry. Each of the “‘n” subtables has 
a length (in bytes) equal to the number of 


elements in the table. 


[5] 1, 1,3 := HASH(KEY); 
2. K :=0; 
3. do until (REHASH(t,J)=J) [worst case end test for search failure} ; 
4. if @(TABLE + | * B) = KEY then [we have a match so] 
5. do; 
6. if (entry is to be deleted] then {delete the entry] 
7. @(TABLE + 1 * B) := —1; 
8. return; 
9. end; 
10. if ((K=0) & [deletion or null element @(TABLE + | * B))) then 
11. K := I [save last available table entry index] ; 
12. if @(TABLE + 1 * B) = 0 then [this is a null entry so] 
13. do; 
14. [enter the item at (TABLE + B * K), next available slot) ; 
15. return; 
16. end; 
17. | := REHASH (I,J) [REHASH results in 0 < | < N where N is table size] ; 
18. end; 


19. call ERROR {if this point is reached, table is full] ; 


The linear rehash that we’ve been using 
implicitly in [4] as steps 14 and 15 is 
described as REHASH(!) = (I+1)[mod N], 
where (I+1)[mod N] means that if (I+1) is 
greater than or equal to N, then N is 
subtracted from the value (I+1). This insures 
that the table is searched in a circular 
manner. The operation X[mod N], called X 
modulo N, is used in most rehashing algo- 
rithms to limit the range. Mathematically, it 
is the remainder of X/N; but whenever we 
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use Xlmod NJ it can be calculated as 
described above (ie: subtract N if X is 
greater than or equal to N). Here again we 
have avoided the use of a divide operation to 
provide a more efficient function. Note that 
step 10 includes a check which reclaims 
deleted entries, a process not included in 
algorithm [4]. 


Improved Rehash 


The problem with the simple linear re- 
hash is that the table will not fill uniformly. 
This condition is referred to as clustering 
and causes an increase in the average number 
of tests required to locate an item in the 
table. As an example, a cluster can be seen 
forming at TABLE+16, +20, and +24 in the 
table shown in figure 2. 

There are a number of nonlinear algo- 
rithms which perform the rehash function 
without causing the clustering problems 
mentioned above. Although the computer 
science literature abounds with such algo- 
rithms, a majority of them fall into one of 
three classes. An attempt has been made to 
select the simplest and best from each class 
and present them here. 


Pseudorandom Rehash 


The first class of rehashing algorithms is 
the pseudorandom rehash and is based upon 
a pseudorandom number generator. The 
pseudorandom number generator used is not 
important, but it must be of the non- 
repeating variety. That is, it must generate 
all possible values before any previous value 
is repeated. It must also generate all of the 
integers in the range 0, ..., N where N is the 
table size. The following simple procedure 
incorporates a common random number 
generator and will perform the rehash func- 
tion for any table of size N= 2M. The 
variable R is internal to the rehashing func- 
tion, but it must be preset to one whenever 
the function HASH is initiated (ie: step 1 of 
algorithm [5]). 


[6] REHASH (IJ): 
1. R := REMAINDER (R*5 / N*4); 
2. REHASH := (R/4 + J) [mod N); 


If you’re seeking the most efficient imple- 
mentation of this one, the REMAIN- 
DER(R*5/N*4) is just the rightmost M+2 
bits of R*5 because N=2™' and 4*N= 
22*2M=2M+2, Furthermore, the divide 
operation in step 2 can be replaced by a 
right shift of two positions. Finally, if you 
think of R*5 as R*4+R, then it’s easy to see 
how to reduce that multiply operation to 
left shift and addition operations. 

Let’s look at the sequence generated by 


this rehash routine. If our table is eight 
entries long and the initial hash index is, let’s 
say, 4, then R_ takes on the values 
1,6,7,4,5,2,3,1, so the table would be 
searched in the order 4{initial index), 
5,2,3,0,1,6,7. How does this avoid the clus- 
tering situation? If we chose another initial 
index, say, 5, then the table is searched in 
the order S(initial index),6,3,4,1,2,7,0. As 
you see, the entry searched after entry 5 will 
depend upon the initial index. if the initial 
index was 4, then 2 is searched after 5; but if 
the initial index was 5, then 6 follows 5. Ina 
linear search, 6 always follows 5. This 
dependence upon the initial index is what 
avoids the clustering. 


Quadratic Rehash 


Asecond class of algorithms for rehashing 
is the quadratic rehash and these are based 
upon a quadratic function. The major draw- 
back with most algorithms in this class is 
that they search only one half of the table, 
so two different rehashing algorithms are 
required. The most efficient quadratic re- 
hash, and one which does search the entire 
table, was first introduced by Colin Day /see 
bibliography, reference 1]. Day’s algorithm 
can only be applied to a table whose size is a 
prime number that produces a remainder of 
1 when it is divided by 4 (eg: 5=4*1+1, 
401=4*100+1). At first glance, this appears to 
place a great many restrictions on the 
allowable size of the table; but don’t despair, 
because experience will show. that a number 
satisfying the required condition can be 
found very near any desired value. Be certain 
that you use an acceptable number or the 
procedure will not search all locations of the 
table. Like the last rehashing function an 
internal variable is used. The variable, R, 
must be preset to (—N) whenever the func- 
tion HASH is called. The quadratic rehash 
process is (remember that the mod operation 
is just a conditional subtraction): 


[7] REHASHI(I,J): 
1. R:=R+2; 
2. REHASH := (i+ IRI) [mod N); 


If we look at the sequence generated by 
this procedure, we see that R takes on the 
values (for a table of size 11=4*2+3) 
—11,-9,—7,—5,—3,—-1,1,3,5,7,9,11. There- 
fore, if the initial index were 4 the table 
would be searched in the order: 4(initial 
index), 2,9,3,6,7,8,0,5,1,10. One major dif- 
ference between this algorithm and the 
random rehash is that this one calculates the 
next index based on the previous one. The 
random rehash calculates the next index 
based on the initial index. 


Weighted Increment Rehash 


The last, and probably the simplest, 
method for performing the rehash is called a 
weighted increment [see bibliography, 
reference 2]. This one is unique because it 
uses the hash index to calculate an incre- 
ment which is in turn used to step through 
the table. The table size is again restricted to 
a power of 2, and whenever the function 
HASH is called, the variable R is preset to 
(2*J+1}[mod N], where J} is the initial hash 
index. The weighted increment method is: 


[8] REHASH(I,J): 
1. REHASH := (I1+R) [mod N]; 


This process is very much like a linear 
rehash. In fact if R were always set to 1 it 
would be a linear rehash; however R depends 
on the hash index. If our table is eight 
entries long and the initial index is 5 then 
R=2*5+1[mod 8]=11—8=3 and the table 
items are searched in the _ order 
5,0,3,6,1,4,7,2. Since the increment is a 
constant for any particular hash index, we 
can improve the basic hash algorithm when 
using this rehash technique. You will notice 
that all memory references are of the form 
(TABLE+I*B), where B is the number of 
bytes. We can avoid that multiply by in- 
cluding it in the computation of R. If we let 
R=((2*J+1)[mod N])*B, then all of the 
table references become (TABLE+I). If we 
also initialize | to TABLE+HASH(KEY) we 
can make all references as just (I). 


Laying Doubts to Rest 


You might conceivably ask, ‘‘What is 
gained by using a complex rehashing func- 
tion?”; or if you’re one of the more cynical 
observers, “‘Why use hashing at all?’’. In an 
attempt to answer these questions, a simple 
experiment was performed. First a table of 
approximately 500 entries was filled with 
randomly generated entries and then each 
entry was located in the table using the 
lookup technique under test. This simple 
experiment provides an insight into the 
comparative efficiency of table lookup algo- 
rithms. Table 1 summarizes the results of the 
experiment. This data clearly illustrates that 
there is significant improvement in table 
lookup time when hashing is utilized. Fur- 
thermore, when a complex rehashing algo- 
rithm is incorporated in the search pro- 
cedure, the statistics are again improved. It is 
worth noting again that, although the num- 
ber of tests for a sorted table is not 
tremendously large, the approach is very 
inefficient if the table must be accessed 
before being filled with entries. 

One other surprising fact about the aver- 
age search length (the number of tests 


required) for hash accessed tables is that it 
does not depend upon the length of the 
table. Rather, the search length depends 
only upon the load factor or the percentage 
of occupied items in the table. This means 
that you can expect the average search time 
for a table of size 10,000 to be about the 
same as the search time for a table of size 
500! This is surely not the case with the 
linear or sorted search. While the average 
linear search length skyrockets to 4,500 (for 
a 90% full table of size 10,000), the average 
hash search length remains at less than six! 
Although table 1 seems to indicate that 
the weighted increment is most efficient, we 
must be careful not to read too much into 
these results. The statistics in table 1 were 
obtained using randomly generated keys in 
the test program. When actual keys are used 
the search statistics will vary somewhat 
because actual keys are rarely perfectly 
random. For example, the search length for 
a weighted increment search is adversely 
effected by bit patterns in the key. The best 
way to insure that you are using the most 
optimal search procedure is to repeat the 
experiment with a sample of actual keys. If a 
finely tuned algorithm is not important, 
then the weighted increment is probably the 
better choice because it is simple and can be 
applied to any format of table. As we will 
see shortly, most of the algorithms work 
best if the table is rearranged in memory. 


Application 


There are a number of “tricks” which can 
be used to improve efficiency. A number of 
them have already been mentioned. Through- 
out our discussion we have assumed that 
each table entry occupies more than a 
single byte. If each table entry is B bytes 
long, then the typical memory reference is 
(TABLE+I*B). It would be desirable to 
eliminate or at least reduce the multiply 
operation. We already discussed how to 
eliminate the multiply if a weighted incre- 
ment rehash is used. Another method to 
eliminate a multiply is table reorganization. 

All of the tables discussed so far were 
horizontally organized. This means that the 
items are stored as shown in figure 5. This is 
the most common table organization. An 
alternative organization is a vertical organiza- 
tion such as in figure 6. If you have 
organized your table vertically then the first 
byte of an item is addressed by (TABLE+1) 
and the multiply is gone. All of the other 
bytes in the item are addressed by 
(BYTEN+I) where BYTEN is the address of 
the nth byte of the first item. Thus by 
organizing the data vertically we eliminate a 
multiply operation. This vertical arrange- 
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ment is practical from other aspects also. 
Consider searching the table for all items 
containing a specific value in the third byte. 
Since the third byte of each item is stored 
sequentially this search operation — is 
simplified. 


Conclusion 


We have tried to show that hashing is not 
nearly as complicated as you might have 
thought. By using these techniques perhaps 
you can regain a valuable slice of your 
microprocessor’s processing load.@ 
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GLOSSARY 


Clustering: Grouping of elements within a table 
caused by equal hash indices. 


Collision: Two elements with the same hash index. 


Direct access hash: A hash algorithm which pre- 
cludes collision. That is, no two elements have 
identical hash indices. 


Disassembler: A program to translate object code 
to assembly language. Inverse of an assembler. 


Folding: Procedure for randomizing the hash 
index. The upper and lower half of the key are 
added together before the index is calculated. 


Horizontal table: A table whose entries are stored 
sequentially. That is, (entry one, byte one), (entry 
one, byte two), etc. 


Hash index: The initial estimate of the location of 
an entry within the table. 


Hashing: A _ nonlinear algorithm for  storing/ 
retrieving data from a table. 


Hashing function: The algorithm or procedure for 
calculating the hash index. 


Key: Field within an entry that is used to locate 
the entry. For example, surnames are the key field 
of the entries of a telephone directory. 


Linear rehash: A method for resolving collisions. 


The table is searched sequentially from the point 
of impact. 


Linear search: Table search which examines each 
item starting with the first item and proceeding 
sequentially. 


MOD: Remainder of one number divided by 
another. That is, X MOD Y is the remainder of 
x/Y. 


Pseudorandom rehash: A method for resolving 
collisions. A nonrepeating random number genera- 
tor is used to determine the next entry to be 
searched. 


Quadratic prime: A prime number which produces 
a remainder of 3 when divided by 4. 


Quadratic rehash: A method for resolving col- 
lisions. A quadratic or second degree function is 
used to determine the next entry to be searched. 


Rehash: Any algorithm for resolving collisions. 


Squaring: Procedure for randomizing the hash 
index. The key is multiplied by itself before the 
hash index is computed. 


Vertical table: A table where the bytes of each 
entry are stored sequentially. That is (entry one, 
byte one), (entry two, byte one), etc. FORTRAN 
stores arrays in this manner. 


Weighted increment rehash: A method for resolv- 
ing collisions. The hash index is used to determine 
the next entry to be searched. 
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“Making Hash With Tables” by Terry 
Dollhoff /BYTE, January 1977, page 18°] 
is a good introduction to hash tables. How- 
ever, quadratic methods for collision avoid- 
ance do not have to be complicated or suffer 
from “half table search.” If the table length 
is a power of 2 and the quadratic increment 
is 3 as in the following simple and fast algo- 
rithm, then none of the table will be ex- 
cluded from the search. 

| have been using this scheme since about 
1970 but have never seen it reported in the 
literature. Your readers may write to me 
for a copy of the proof that it works.® 


A Quadratic Hash Table 


The following algorithm assumes that 
the table length is a power of 2, the table 
words were initialized to VIRGIN, and 
MASK has a value equal to the table length 
minus 1. 

. Set DELto 0. 

. Set | to hash code of KEY. 

. Let I=LAND.MASK (ie: AND 1 with 
MASK). 

. If TABLE (I)=VIRGIN then go to 
NOTFOUND. (Note that TABLE(I) 
refers to the contents of location 
TABLE+). 

. If TABLE(V=KEY then go to 
FOUND. 

. Let DEL=(DEL+3).AND.MASK. 

. If DEL=O then go to FULL. (Note 
that DEL gets back to 0 only after 
the whole table has been searched.) 

. Let t=(I+DEL).AND.MASK. 

. Go to step 4. 

On return to the user’s program via the 
NOTFOUND, or FOUND exits, the index, I, 
will point to the spot for a new table entry 
or the found entry respectively. The FULL 
return means the KEY was not found and 
that the table is full. Note that the value 
VIRGIN may not equal any possible value 
of KEY. 


1page 84 in this edition 
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The Care and Feeding 


of Binary Trees 


Many computer applications require 
sorting, searching, or both. While hashing is a 
useful tool for maintaining tables, it does 
not lend itself to particularly dense tables 
and is of no use in sorting. This article 
describes the use of binary trees to perform 
both sorting and searching at high speed 
with modest overhead. 

Although numerous applications require 
sorting and searching, a good example that 
requires both is the label table in an assem- 
bler. During assembly, fast access is required 
each time a label is referenced, and, at the 
end, the table must be sorted if it is to be 
listed in alphabetical order. 

Like all things, binary trees have advan- 
tages and disadvantages. They deal best with 
large amounts of data encountered in 
random order. This property makes them 
ideal for use in assemblers and other applica- 
tions with similar data. Binary trees provide 
a method which is just the opposite of 
sequential searching or bubble sorting, which 
are at their best with small amounts of data. 


Recognizing Binary Trees 


The term ‘binary tree’ refers to a method 
of linking data records together in memory. 
Typically the records remain in memory in 
the same order in which they were encoun- 
tered. Pointers are used to link them together 
in an ordered manner. The creation and use 
of these pointers is the heart of a binary tree. 

In an assembler, for example, each label 
entry might contain the label itself, a one 
byte flag and a 16 bit value, as shown in 
figure 1, In order to be used as a binary tree, 
each entry must include two address pointers 
of 16 bits each, for a total of four bytes. 
These pointers are called up and down 
pointers since they will be used to point to 
higher and Jower entries in the tree (more 
about that later). Our example now looks 
like figure 2. 

In most discussions of trees, they are 
called ‘trees’ but are drawn upside down like 
roots. Others say this is silly and draw their 
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trees right side up. In an effort to please all 
and offend none, trees are represented here 
on their sides. This has the advantage that 
up and down references, when applied to 
pointers, actually mean up and down in the 
figure, which is not true with the other 
representations. 

A simple tree structure following our 
example would appear as in figure 3. For 
simplicity, only the label and the pointers 
have been shown. This tree is already built; 
we'll get to the logic on how to build one 
shortly. 





Figure 1. Possible data arrangement for label 
entries for an assembler label table. 


LABEL FLAG VALUE 


Figure 2. To use the binary tree structure, 
each entry must also have two pointers to 
point to the entry before it and the one 
after it. 


LABEL FLAG VALUE UP DOWN 
POINTER POINTER 


Figure 3. Simple binary tree structure. The 
first label encountered is MAT. This label 
leads up to GO or down to RUN, both of 
which end the tree. 
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The pointers link the tree together. The 
up pointer in each entry points to the rest of 
the tree which is above the current item 
while the down pointer points to those items 
below the current entry. 

Null pointers are pointers which don’t 
point anywhere. These are shown in the 
diagram as asterisks (*) and are usually 
represented in memory as zero. Null pointers 
indicate that a particular entry is the end of 
a branch. 


Searching a Binary Tree 


To search for an item in a binary tree, 
start with the first item (MAT in the exam- 
ple) and compare it to the item being looked 
for. If the item being searched is above the 
current entry in the sort sequence follow 
the up pointer, while if it is below, follow 
the down pointer. The procedure is repeated 
until either a match is found or a null 
pointer is encountered. The latter case 
indicates that the item searched for isn’t in 
the tree. The logic for searching such a tree 
is shown in listing 1. 

To search the tree for the word GO, start 
at MAT and perform a comparison. Since 
GO is above MAT, take the up pointer from 
MAT which points to GO, producing a 
match and stopping the search. 


Growing Binary Trees 


Once we have the routine to search a tree, 
adding items is easy. To add an item to the 
tree search it first to make sure the item isn’t 
already there. Using the example above, first 
search the tree in order to add the label 
NUM. Comparing with MAT indicates that 
NUM is below, so follow the down pointer 
which points to RUN. Comparing with 





(search pointer = address of first tree entry) 


DO UNTIL 


(current entry = required entry) 


IF (current entry > required entry) THEN 


IF (current up-pointer = 0) THEN 
(search pointer = address of current up-pointer) 
(return signaling ‘not found’) 
ELSE 
(search pointer = current up-pointer) 
ENDIF 
ELSE 
IF (current down-pointer = 0) THEN 
(search pointer = address of current down-pointer) 
(return signaling ‘not found’) 
ELSE 
(search pointer = current down-pointer) 
ENDIF 
ENDIF 
ENDDO 


(return signaling ‘found’) 


Listing 1. Binary tree search logic. This logical routine will search the entire 
binary tree until the looked for label is found or a null pointer is encountered. 
This listing expresses the logic in a “pseudo code”’ language. 
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RUN indicates that NUM is above RUN, so 
follow the up pointer from RUN. But since 
the up pointer is a null pointer there is no 
match and the search is terminated. This 
null pointer, however, is the one that should 
point to our new entry. To add NUM simply 
build an entry for it at the end of the table 
(with both its pointers set to null) and then 
adjust the null vp pointer from RUN to 
point to the new entry for NUM. The tree is 
now as shown in figure 4. The logic for 
adding a new item to the tree is shown in 
listing 2. 

The procedure given above tells how to 
add items to an existing tree but doesn’t tell 
how to get the tree started in the first place. 
To start a tree, take the first item to be 
added to the tree and build an entry for it 
with both its pointers set to null. Since it is 
the first item in the tree there are no poin- 
ters to it. 





Figure 4. Inserting another entry to the 
binary tree. The up pointer from RUN js 
now pointing to the new label which is lower 
than MAT but higher than RUN. 





Sorting with Binary Trees 


So much for searching. What about 
sorting? Although it may not appear so, 
once a tree has been built it is actually 
sorted as well. This can be seen by placing a 
piece of paper over figure 4 and slowly 
sliding it down. The fabels will appear in 
sequence because the diagram shows them in 
logical order. Reading them back from the 
computer’s memory isn’t quite this simple, 
since the logical order is given by the poin- 
ters rather than by physical order, but it’s 
not very hard either. 

The logic to read back a tree in order 
consists, in a nutshell, of performing the 
instructions in table 1. 

The complete algorithm for this process 
is shown in listing 3. In order to remember 
the path followed a stack is used. Each stack 
entry requires 3 bytes: 2 for a 16 bit pointer 
and one for a flag to indicate if the path was 
via an up or down pointer. 


(call the search routine) 
IF (the item was found) THEN 
(return signaling ‘duplicate’) 
ENDIF 
(build an entry for the new item) 
(get the pointer left by the search routine} 
{use this pointer to store the address of the new entry. . .) 
(.. . over the null pointer that terminated the search} 
(return signaling completion) 


Listing 2. Binary tree data insertion logic. This routine uses the search routine 
to find if the item is already in the tree. If it is not, the search routine returns 
with the pointer indicating the null pointer that ended the search. The new 
entry will be added to the tree at this position. 





(search pointer = address of first tree entry) 
DO FOREVER 
IF (current up-pointer = 0) THEN 
(print the current entry) 
DO WHILE (current down-pointer = 0) 
IF (stack is empty) THEN 
(return) 
ENDIF 
(search pointer = top pointer from stack) 
(search flag = top flag from stack) 
(pop the stack) 
DO WHILE (search flag = ‘D’) 
IF (stack isempty) THEN 
(return) 
ENDIF 
(search pointer = top pointer from stack) 
(search flag = top flag from stack) 
(pop the stack) 
ENDDO 
(print the current entry) 
ENDDO 
(push the stack) 
(top stack pointer = search pointer) 
(top stack flag = ‘D’) 
(search pointer = current down-pointer) 
ELSE 
(push the stack) 
(top stack pointer = search pointer) 
(top stack flag = ‘U’) 
(search pointer = current up-pointer) 
ENDIF 
ENDDO 


Listing 3. This list of instructions is the procedure that is followed when 
sorting a binary tree. 


Start at the root of the tree (‘MAT’ in the example). 

If there is an up pointer, follow it and make a record of the path followed. 

When returning from following the up pointer (or if the up pointer was null), prin 
the current entry, then follow the down pointer and make a record of the path. 

When both pointers have been processed (or if the down pointer is null) back up to 
the previous entry. 

The sort is done when an attempt is made to back up to the previous entry and there 
is no previous entry found. i 


Table 1: Instructions for carrying out a sort of a binary tree. 





Start and the beginning of the tree (MAT). 

MAT has an up pointer, so follow it, pushing the address of MAT (the entry being 
come from) and a U flag (to indicate that the path followed an up pointer) onto the 
stack. 

The entry being pointed to now is GO. 

GO has a null up pointer which requires no action. 

Having processed GO’s up pointer, GO is printed. 

GO also has a null down pointer which requires no action. 

Since both of GO’s pointers have been processed, the stack is popped. 

The entry now being pointed to is MAT. 

Since a U flag was popped, the up pointer (but not the down pointer) has been 
examined; therefore MAT is printed. 

MAT has a valid down pointer, so it is followed and the address of MAT and aD flag 
(since the path was via a down pointer) are pushed onto the stack. 

The entry being pointed to is now RUN. 

RUN has a valid up pointer, so the address of RUN and a U flag are pushed onto the 
stack. 

The entry being pointed to is NUM. 

Since both of the pointers are null, NUM is printed and the stack is popped. 

The entry being pointed to is now RUN. 

Since RUN’‘s up pointer has already been processed, RUN is printed. 

RUN’‘s down pointer is null so the stack is again popped. 

MAT is now being pointed to. 

Having examined both the up and down pointers of MAT, an attempt is made to pop 
the stack. 

Since the stack is empty, it cannot be popped. This signals that all processing is 
complete. 


Table 2. This Is a trace of the procedure for reading back and printing the 
example tree of figure 4 using the logic routines of listing 3. 
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To illustrate this, table 2 is a trace of the 
procedure necessary to read back and print 
the example tree from figure 4. 

This procedure will work for a tree of any 
size as long as there is enough room for the 
stack. The maximum number of stack entries 
required during the sorted readback depends 
on the order in which the data is placed in 
the tree. If the data comes in in random 
order, then the number of stack entries will 
not be much greater than the base 2 logarithm 
of the number of entries. For example, a 
stack with a depth of 16 should handle a 
tree containing up to about 64,000 entries. 
The worst case for stack depth occurs when 
the data was already sorted, or reverse sorted, 
when read in. This case will require as many 
stack entries as there are items in the tree. 


Optimizing Binary Trees 


If the tree routines use the lower 32K 
bytes of address space, the sort readback 
stack can be reduced to two bytes per entry 
by using the high order address bit in place 
of the up or down flag. If this is done, the 
high order bit will probably have to be 
masked off before addresses from the stack 
are used to access memory. 

It is possible to balance a binary tree as it 
is being built. Such a tree will always require 
a minimum number of comparisons for a 
search and a minimum amount of stack 
depth during sort readback. On the other 
hand, balancing a tree requires two more 
bits per entry (hence probably an extra 
byte) and is quite complex. The balancing 
algorithm is too complicated to include here; 
however, a complete description can be 
found in The Art of Computer Programming, 
Volume 3 by Donald Knuth. # 
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